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EXPRESSION SYSTEM FOR ALTERED EXPRESSION LEVELS 



5 

Related Applications 

This application is a continuation-in-part application of United States Serial 
Number 08/699,092 filed August 16, 1996, hereby incorporated by reference in its 
entirety. 

10 

Field of the Invention 

The present invention relates to the discovery of the lipase regulation 
cascade of Pseudomonas alcatigenes. Specifically, the present invention provides 
the nucleic acid and amino acid sequences of various components of the lipase 
15 regulation cascade which may be used in expression methods and systems 
designed for the production of heterologous proteins. 

Background of the Invention 

The isolation and identification of a microorganism that can naturally secrete 

20 a product of potential industrial production is one of, if not the most, vital steps in 
the process of fermentation biotechnology. The ability to secrete the protein of 
interest usually leads to easier downstream processing. The next critical stage is 
the mutagenesis of a naturally occurring strain to a hyper-producing strain. Over a 
number of years, scientists have developed screening strategies from which a 

25 number of exo-protein producing bacteria have been isolated. Following isolation, a 
large number of rounds of mutagenesis can be used to continuously select higher 
producing strains. However, classical strain improvement cannot be used 
indefinitely to further increase production levels. Therefore, a more direct method of 
characterization and molecular genetic manipulation is needed to achieve higher 

30 production levels. 

Several patents and publications have claimed or described a lipase 
modulator gene (WO 94/02617; EP 331,376; Nakanishi et al. (1991) Lipases-Struct. 
Mech. Genet Eng. GBF Monographs 16:263-266). However, later research has 
shown that the product of the gene, now called ///, is concerned with folding of the 

35 lipase rather than regulating the expression of the lipase. A review of various lipase 
expression systems that use the Iff gene product can be found in Jaeger et al. 
(1994) FEMS Microbiol. Rev. 15:29-63. 
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Another publication discusses th sigma 54 promoter and th types of genes 
that have been described to be und r control of this type f promoter. Morrettand 
Segovia (1993) J. Bacter. 175:6067-6074. 

The search has continued for an expression system that can efficiently 
5 express a heterologous protein, particularly a lipase in Pseudomonas, in particular 
Pseudomonas alcaligenes. Pseudomonas expression of lipase is very difficult and 
often is at lower levels than industry would like to see. 

The present invention solves the problem of low levels of expression of 
proteins in Pseudomonas as well as other microbial hosts. 

10 

Summary of the Invention 

The present invention relates to the discovery of a Pseudomonas lipase 
regulation cascade and provides individual components of the regulation cascade 
that can be used in expression systems for the production and secretion of proteins 

15 in host cells. The regulation cascade comprises, surprisingly, a two-component part 
that includes a kinase and a DNA binding regulator. The two components work in 
concert with a promoter and an upstream binding sequence to efficiently express a 
protein. The regulation cascade also comprises secretion factors that can be used 
in host cells to enhance the secretion of produced proteins. 

20 The present invention provides nucleic acid and amino acid sequences for 

the various components of the Pseudomonas alcaligenes lipase regulation cascade. 
The present invention also provides new, efficient expression systems, i.e., 
expression vectors, and host cells that can be used to express proteins at increased 
levels. The new expression systems allow for increased expression of a protein 

25 whose gene is functionally linked to components of the expression system, i.e., 

components of the lipase regulation cascade. A hyper-producing strain can thus be 
developed and used in a commercial setting. 

In one embodiment of the invention, an isolated nucleic acid encoding a 
kinase that can regulate the expression of a protein, preferably a lipase, is provided. 

30 The nucleic acid encoding a kinase is preferably derived from a Gram-negative 

bacteria such as a pseudomonad, preferably from Pseudomonas alcaligenes and is 
most preferably HpQ. Further, nucleic acid encoding the kinase preferably has the 
sequence as shown in Figures 1A-1B (SEQ ID NO: 1) and/or has at least 50% 
homology with that sequence. The kinase protein is also provided and it is 

35 preferably derived from a bacteria, preferably from a Gram-negative bacteria such 
as a pseudomonad, most preferably, the kinase is from Pseudomonas alcaligenes. 
In a preferred mbodiment, the kinase is LipQ. Th kinase pref rablyhasthe 
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sequence shown in Figures 1A-1B, (SEQ ID NO: 2) and/or has at least 50% 
homology with that sequ nee. 

In another mbodiment, the present inv ntion provides a nucleic acid 
encoding a kinase that can regulate the expression of a lipase in Pseudomonas 
5 afcatigenes. In another embodiment, the present invention provides a kinase 
capable of regulating the expression of a lipase in Pseudomonas akaligenes. 

In a further embodiment of the invention, an isolated nucleic acid encoding a 
DNA binding regulator that can regulate the expression of a protein, preferably a 
lipase, is provided. The DNA binding regulator nucleic acid is preferably BpR. 

10 Further, it preferably has the sequence as shown in Figures 2A-2B 

(SEQ ID NO: 3) and/or has at least 50% homology with that sequence. The DNA 
binding regulator protein is also provided and it is preferably LipR. The DNA binding 
regulator preferably has the sequence shown in Figures 2A-2B (SEQ ID NO: 4) 
and/or has at least 50% homology with that sequence. Preferably, the DNA binding 

15 regulator is from bacteria. More preferably, the DNA binding regulator is from a 

Gram-negative bacteria such as a pseudomonad. Most preferably, the DNA binding 
regulator is from Pseudomonas alcaligenes. 

In yet a further embodiment, the present invention provides an isolated 
nucleic acid that encodes a DNA binding regulator that can regulate the expression 

20 of a lipase in Pseudomonas alcatigenes. In another embodiment, the present 
invention provides the DNA binding regulator itself. 

In yet another embodiment of the invention, nucleic acid encoding a portion 
of a polymerase that can regulate the expression of a protein, preferably a lipase, is 
provided. The polymerase nucleic acid is preferable orfZ. Further, it preferably has 

25 the sequence as shown in Figure 9A-9B (SEQ ID NO: 36) and/or has at least 75% 
homology with that sequence. A portion of the polymerase protein is also provided 
and it is preferable OrfZ. The polymerase protein preferable has the sequence 
shown in Figure 9A-9B (SEQ ID NO: 37) and/or at least 75% homology with the 
sequence. Preferably, the polymerase is from Gram-negative bacteria such as 

30 pseudomonad. Most preferably, the polymerase is from Pseudomonas alcaligenes. 

In another embodiment, the kinase, the DNA binding regulator and a portion 
of the polymerase are present in one nucleic acid. In another embodiment, the 
kinase, the DNA binding regulator and the polymerase have the nucleic acid 
sequence shown in Figures 4A-4G (SEQ ID NO: 28). 

35 In another embodiment of the invention, an isolated nucleic acid encoding a 

Pseudomonas alcaltgenes sigma 54 promoter is provided. 

In a further mb diment of the inventi n, an isolated nucleic rid encoding a 
Pseudomonas alcaligenes upstream activating sequence is provided. The upstream 
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activating sequence is preferably UAS. Furth r, it preferably has the sequence as 
shown in SEQ ID NO: 5 and/or has at I ast 50% homology with that sequence. 
Preferably, th upstream activating sequ nee is from bacteria. More preferably, the 
upstream activating sequence is from a Gram-negative bacteria such as a 
5 pseudomonad. Most preferably, the upstream activating sequence is from 
Pseudomonas afcafigenes. 

In yet another embodiment of the invention, isolated nucleic acids encoding 
secretion factors are provided. The secretion factors are preferably XcpP, XcpQ, 
OrfV, OrfX, XcpR, XcpS, XcpT, XcpU, XcpV, XcpW, XcpX, XcpY, XcpZ and 

10 another protein, OrfY, having the C-tenminal amino acid sequence shown in SEQ ID 
NO: 35. Further, they preferably have the nucleic acid sequence as shown in SEQ 
ID NOS: 12, 14, 30, 16, 6, 8,10, 18, 20, 22, 24, 26, 32 and 34, respectively, and/or 
have at least 90% homology with those sequence. The secretion factor proteins are 
also provided and preferably have the amino acid sequences shown in SEQ ID 

15 NOS: 13, 15, 31, 17, 7, 9, 11, 19, 21, 23, 25, 27, 33 and 35, respectively, and/or 
have at least 90% homology with that sequence. Preferably, the secretion factors 
are from bacteria. More preferably, the secretion factors are from a Gram-negative 
bacteria such as a pseudomonad. Most preferably, the secretion factors are from 
Pseudomonas alcaligenes. 

20 In a further embodiment, the genes encoding the secretion factors XcpP, 

XcpQ, Orf V, OrfX, XcpR, XcpS, XcpT, XcpU, XcpV, XcpW, XcpY, XcpX and OrfY 
are present in one nucleic acid having the DNA sequence shown in Figures 3AA- 
3BB (SEQ ID NO: 29). Both xcp gene dusters xcpP~Q and xcpR~Z are oriented 
divergently with in between OrfV and OrfX as shown in Figure 8. 

25 Another embodiment of the invention includes an isolated nucleic acid 

encoding a Pseudomonas alcaligenes lux-box binding element and orfV-box binding 
elements that can regulate expression of a protein. 

Yet another embodiment provides nucleic acids that can hybridize to the 
nucleic acids shown in SEQ ID NOS: 1. 3, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 

30 30, 32, 34 and 36 under high stringency conditions. 

In a further embodiment, there is provided an expression system comprising 
a gene encoding a protein functionally linked to nucleic acids encoding a kinase, a 
DNA binding regulator, a polymerase, a promoter and an upstream activating 
sequence. The expression system can also include secretion factors, and their 

35 regulatory regions. Preferably, the regulating elements and the secretion factors 
are from bacteria. More preferably, the regulating elements and the secretion 
factors are from a Gram-n gative bacteria such as a pseudomonad. Most 
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preferably, the regulating elements and the secretion factors are from 

Pseudomonas alcaligenes. 

Another embodiment provides an expression system that can regulate the 

expression of a lipase in Pseudomonas alcaligenes, 
5 In another embodiment of the invention, replicating plasmids and integrating 

plasmids containing the expression system or a nucleic acid encoding one or more 

of the secretion factors are provided. 

Also provided are methods of transforming a host cell with a plasmid that 

contains the expression system and/or a nucleic acid encoding one or more 
10 secretion factors as well as transformed host cells containing the expression system 

and/or a nucleic acid encoding one or more secretion factors. A host cell is 

transformed by introducing the plasmid to the host cell under appropriate conditions. 

Preferably, the host cell is electroporated to allow the plasmid to enter the host cell. 

Preferably, the host cell is bacteria. More preferably, the host cell is a Gram- 
15 negative bacteria such as a pseudo monad. Most preferably, the host cell is 

Pseudomonas alcaligenes. 

Brief Description of the Drawings 

Figures 1A-1B show the DNA (SEQ ID NO: 1) and amino acid sequences 

20 (SEQ ID NO: 2) of LipQ from Pseudomonas alcaligenes. 

Figures 2A-2B show the DNA (SEQ ID NO: 3) and amino acid sequences 
(SEQ ID NO: 4) of LipR from Pseudomonas alcaligenes. 

Figures 3AA-3BB show the DNA sequence (SEQ ID NO: 29) of 17.612 bp 
from the insert on cosmid #600 containing the secretion factors XcpQ, XcpP, OrfV, 

25 OrfX, XcpR, XcpS, XcpT, XcpU, XcpV, XcpW, XcpX, XcpY. XcpZ and a part of an 
other protein OrfY from Pseudomonas alcaligenes. The predicted amino acid 
sequences of the open reading frames (SEQ ID NO: 13, 15, 31, 17 t 7, 9, 11, 19, 
21, 23, 25, 27, 33 and 35, respectively) are shown in one-letter code below the DNA 
sequence. Likewise, the terminator sequences are shown as two bolded convergent 

30 arrows and the binding elements for regulator, OrfV (orfV-boxes) are shown as a 
bolded boarded line. 

Figures 4A-4G show the DNA sequence (SEQ ID NO: 28) of the overlapping 
4.377 bp fragment of cosmids #71, #201, #505, #726 that includes the open 
reading frames of LipQ, Lip R and a part of OrfZ from Pseudomonas alcaligenes. 

35 The predicted amino acid sequence of the open reading frames (SEQ ID NO: 2, 4 
and 37, respectively) are shown in one-letter code below the DNA sequence. 
Ukewis , th t rminator sequence is shown as two bolded convergent arrows, th 
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binding etem nt for autoinducers (/ux-box) and the binding elements for OrfV (orfV- 
boxes) are shown as a bolded boarded line. 

Figure 5 shows the ffect on lipase production of cosmid #505 at 10 liter 
scale. A threefold higher yield of lipase after fermentation was observed. 
5 Figure 6 shows production plasmid stability in production strain Ps1084 and 

Ps1084 + cosmid #600 as determined by neomycin resistance. 

Figure 7 shows the theoretical scheme for the action of LipQ, LipR, the 
sigma 54 promoter and the upstream activating sequence on the DNA strand 
encoding LipA. The small rectangle on the DNA strand below the D-domain of LipR 
10 is the upstream activating sequence (UAS). 

Figure 8 shows the orientation of the xcp-genes from Pseudomonas 
atcaligenes on the map of cosmid #600 as extracted from SEQ ID NO: 29. 

Figure 9A-9B shows the DNA (SEQ ID NO: 36) and ammo acid sequence 
(SEQ ID NO: 37) of OrfZ from Pseudomonas atcaligenes. 
15 Figure 10 shows the proposed model for the regulation cascade of the lipase 

from Pseudomonas alcatigenes. 

Detailed Description of the Invention 

In order to further improve lipase expression in Pseudomonas alcafigenes, a 

20 pragmatic search for limiting factors was initiated. A cosmid library from the wild- 
type P.atcattgenes genome was used as a donor of DNA fragments to be introduced 
into a multicopy P. atcaligenes lipase production strain. In total, 485 cosmids were 
transformed, followed by screening of cosmids containing P.aicaligenes strains with 
respect to their lipase production activity. Twenty cosmid strains were selected, 

25 each of which showed a significant enhancement of lipase expression as judged 
from various liquid and plate tests. The corresponding cosmids were also tested in 
a single copy lipase strain and some of them were found to give a threefold 
increase of lipase expression. The four best cosmids were found to share an 
overlapping fragment of 5.6 kb. The lipase stimulating activity was localized on a 

30 4.5kb fragment 

The present invention relates to the identification of a pseudomonas 
alcaligenes lipase regulation cascade, which contain multiple components 
associated with the expression of lipase. As used herein, the term "regulation 
cascade 1 * relates to the entire complex of individual components identified herein, 

35 such as kinase, dna binding regulator, polymerase, uas, lux-box, orfv-boxes, 
secretions factors and their regulatory regions. Components of the regulation 
cascade can be used alone r in combination with other compon nts to modulate 
the xpression of prot ins in host cells. In a preferred embodiment, the host cell is 
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a gram-negative host. In another embodiment, the host cell is a pseudomonad. In 
an ther preferred mbodiment, the host cell is pseudomonas alcaligenes. 

Preferred desired proteins for expressi n include enzymes such as 
esterases; hydrolases including proteases, cellulases, amylases, carbohydrases, 
5 and lipases; isomerases such as racemases, epimerases, tautomerases, or 
mutases; transferases, kinases and phophatases. The proteins may be 
therapeutically significant, such as growth factors, cytokines, ligands, receptors and 
inhibitors, as well as vaccines and antibodies. The proteins may be commercially 
important such as proteases, carbohydrases such as amylases and glucoamylases, 

10 cellulases, oxidases and lipases. The gene encoding the protein of interest may be 
a naturally occurring gene, a mutated gene or a synthetic gene. 

The 4.5 kb fragment was sequenced and found to encode the UpQ, LipR 
and polymerase proteins (Figures 4A-4G). While not intending to be bound by 
theory, it is believed that these proteins are involved in the regulation of the sigma 

15 54 promoter in front of the lipase (UpA) and lipase modulator (LipB) gene region 
(see Figure 7). These sigma 54 promoters characteristically have an upstream 
enhancer region, herein the upstream activating sequence or UAS, which is 
regulated by proteins. Regulation can be achieved by either a two-component 
system, such as NtrB-NtrC, or by a one-component system, for example NifA, in 

20 which the protein is in close association with the substrate (reviewed by Morett and 
Segovia, supra). 

According to the present invention, expression of a protein can be regulated 
when a kinase and a DNA binding regulator, which are provided in trans, interact 
with a promoter and/or an upstream activating sequence which are functionally 

25 linked to a gene encoding the protein of interest Preferably, the expression of the 
protein is increased. 

A "kinase" is an enzyme that can catalyze the transfer of phosphate to either 
itself or another protein. The kinase of the present invention is preferably UpQ, a 
kinase that can regulate the expression of a lipase. A UpQ has been isolated from 

30 Pseudomonas alcaligenes. As such, the kinase preferably is encoded by a nucleic 
acid having the DNA sequence shown in Figures 1A-1B (SEQ ID NO: 1) and has 
the amino acid sequence shown in Figures 1A-1B (SEQ ID NO: 2). A kinase can 
act alone or as part of an expression system to regulate the expression of the 
protein. In some cases, the absence of this kinase will cause the expression of the 

35 protein to be decreased or eliminated. 

A "DNA binding regulator" is a proteinaceous substance which physically 
interacts with DNA and, in d tng so, influences the expression of genes dose to the 
binding position. Th DNA binding regulator is preferably LipR, a DNA binding 
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regulator that can regulate th xpression of a lipase. A LipR has been isolat d 
from Pseudomonas alcaBg nes. As such, the DNA binding regulator preferably is 

ncoded by a nud ic acid having the DNA sequence shown in Figures 2A-2B (SEQ 
ID NO: 3) and has the amino acid sequence shown in Figures 2A-2B (SEQ ID NO: 

5 4). A DNA binding regulator can act alone or as part of an expression system to 
regulate the expression of the protein. A DNA binding regulator of the present 
invention can be used alone or in combination with a kinase. The present invention 
encompasses variants of the DNA binding regulator disclosed herein that are 
capable of autophosphorylation. Such variants can lead to a constitutively higher 

10 expression of the target protein. In some cases, the absence of this DNA binding 
regulator will cause the expression of the protein to be decreased or eliminated. 

As used herein "polymerase" refers to an enzyme that elongates DNA or 
RNA to obtain larger strands of either DNA or RNA, respectively. It is one of the 
most crucial factors in the production of proteins, such as lipase. In a preferred 

15 embodiment the polymerase is OrfZ. Thus, in a preferred embodiment, the 
polymerase preferably is encoded by a nucleic acid having the DNA sequence 
shown in Figure 9A-9B (SEQ ID NO: 36) and has the amino acid sequence shown 
in Figure 9A-9B (SEQ ID NO: 37). The polymerase may play a role in modifying the 
expression of the desired protein. 

20 Promoters are DNA elements that can promote the expression of a protein. 

A "sigma 54 promoter* is a bacterial promoter and is a member of a class of sigma 
factors with a size of approximately 54 Kda. These sigma factors are also known as 
RpoN proteins. Sigma 54 promoters and their functions are discussed in Morrett 
and Segovia (1993) J. Barter. 175:6067-6074. Preferably, the promoter is a 

25 Pseudomonas alcaligenes sigma 54 promoter. Most preferably, the sigma 54 

promoter is the lipase promoter of P. alcaligenes (SEQ ID NO: 5) (WO 94/02617). 
According to the present invention, the sigma 54 promoter has an upstream 
activating sequence. 

An "upstream activating sequence" is a binding position for a positively- 

30 acting DNA binding regulator. As indicated by its name, the upstream activating 
sequence is upstream of the transcription start site and is a nucleic acid. The 
upstream activating sequence is preferably UAS, an upstream activating sequence 
that can regulate the expression of a lipase, and is preferably derived from 
Pseudomonas alcaligenes. An upstream activating sequence can act alone or as 

35 part of an expression system to regulate the expression of the protein. In some 

cases, the absence of this upstream activating sequence will cause the expression 

of the protein to be decreased or eliminated. Preferably, the upstream activating 
sequence is th consensus: TGT(N)i ACA . In the Pseudomonas alcaligenes 
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lipase gen sequence, on specific region around -200 bp from the ATG start cotton 
fits this consensus: TGTtcccctcggtaACA (SEQ ID NO: 5) (WO 94/02617). 

A secretion factor is a protein that aids in secreting an ther protein from a 
cell. Preferably, the secretion factor is a member of the Xcp protein family and acts 
5 in concert with other members of the Xcp protein family. A genomic fragment 

encoding genes xcpQ, xcpP, orfV, orfX, xcpR, xcpS, xcpT t xcpU, xcpV, xcpW f xcpX, 
xcpY, xcpZ and the C-terminal part of protein OrfY has been isolated from 
Pseudomonas alcafigenes. As such, the secretion factors preferably are encoded 
by a nucleic acid having the DNA sequence shown in Figures 3AA-3BB (SEQ ID 

10 NO: 29). Specifically and more preferably, the XcpP secretion factor is encoded by 
the DNA sequence shown in SEQ ID NO: 12 and has the amino acid sequence 
shown in SEQ ID NO: 13; the XcpQ secretion factor is encoded by the DNA 
sequence shown in SEQ ID NO: 14 and has the amino acid sequence shown in 
SEQ ID NO: 15; the OrfV protein is encoded by the DNA sequence shown in SEQ 

15 ID NO: 30 and has the amino acid sequence shown in SEQ ID NO: 31; the OrfX 
protein is encoded by the DNA sequence shown in SEQ ID NO: 16 and has the 
amino acid sequence shown in SEQ ID NO: 17; the XcpR secretion factor is 
encoded by the DNA sequence shown in SEQ ID NO: 6 and has the amino acid 
sequence shown in SEQ ID NO: 7; the XcpS secretion factor is encoded by the 

20 DNA sequence shown in SEQ ID NO:8 and has the amino acid sequence shown in 
SEQ ID NO: 9; the XcpT secretion factor is encoded by the DNA sequence shown 
in SEQ ID NO: 10 and has the amino acid sequence shown in SEQ ID NO: 11; the 
XcpU secretion factor is encoded by the DNA sequence shown in SEQ ID NO: 18 
and has the amino acid sequence shown in SEQ ID NO: 19; the XcpV secretion 

25 factor is encoded by the DNA sequence shown in SEQ ID NO: 20 and has the 
amino acid sequence shown in SEQ ID NO: 21; the XcpW secretion factor is 
encoded by the DNA sequence shown in SEQ ID NO: 22 and has the amino acid 
sequence shown in SEQ ID NO: 23; the XcpX secretion factor is encoded by the 
DNA sequence shown in SEQ ID NO:24 and has the amino acid sequence SEQ ID 

30 NO: 25; the secretion factor XcpY is encoded by the DNA sequence shown in SEQ 
ID NO: 26 and has the amino acid sequence shown in SEQ ID NO: 27; the secretion 
factor XcpZ is encoded by the DNA sequence shown in SEQ ID NO: 32 and has the 
amino acid sequence shown in SEQ ID NO: 33; a part of protein OrfY is encoded by 
the DNA sequence shown in SEQ ID NO: 34 and has the amino acid sequence 

35 shown in SEQ ID NO: 35. 

Upstream of the BpQ gene, a promoter region has been identified. Within 
this promoter region, a lux-box can be recognized, see SEQ ID NO: 26. This /ax- 
box shows significant homology to the binding site for luxR type regulat r el ments, 
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which are known to be under control of autoinducer (Latifi et al. (1995) Molec. 
Microb. 17(2): 333-323). This lux-box probably represents a linkage between th 
autoinducer syst m f UpR and lipase regulation. As such, another embodiment of 
the invention includes a nucleic acid encoding a lux-box element. 
5 Upstream of the xcpP~Q, xcpft-Z gene clusters, the orfX, the orfV genes 

(SEQ ID NO: 29) and upstream of the orfZ gene (SEQ ID NO: 28) regulatory regions 
are present A box can be recognized in the promoter region having the consensus 
sequence ANAANAANAANAA. These boxes are referred to as o/fV-binding 
elements, because OrfV shows homology with the well-known Escherichia coti 

10 regulator MalT. Based upon OrfV homology with the known regulator MalT, OrfV 
may be a regulator. These o/f\Aboxes can control the expression of the Xcp- 
proteins, OrfX as well as OrfV itself. Similarly, the expression of the polymerase 
OrfZ may be controlled by the o/flZ-boxes, as shown in Figure 10. As such, in an 
other embodiment, the invention provides a nucleic acid encoding an otfV-box 

15 element. 

Commonly, when describing proteins and the genes that encode them, the 
term for the gene is not capitalized and is in italics, i.e., HpQ. The term for the 
protein is generally in normal letters and the first letter is capitalized, i.e., LipQ. 
The kinase, DNA binding regulator, promoter and upstream activating 

20 sequence will sometimes be referred to as The regulating elements - for ease of 
discussion. The preferred regulating elements are LipQ, LipR, the Pseudomonas 
alcaligenes polymerase, the Pseudomonas alcaligenes sigma 54 promoter and 
Pseudomonas alcaligenes UAS, and can regulate the expression of a lipase in 
Pseudomonas alcaligenes as defined herein. The kinase, the DNA binding 

25 regulator and polymerase are proteins, and the promoter and the upstream 
activating sequence are nucleic acids. In transformed cells, DNA encoding the 
kinase and DNA binding regulator were multiplied using a plasmid which led in turn 
to a higher production of the kinase and DNA binding regulator. The increased 
production of the kinase and DNA binding regulator resulted in higher transcription 

30 from the sigma 54 promoter which provides higher expression of the protein of 
interest 

The kinase and DNA binding regulator of the present invention represent a 
two-component regulatory system. Preferably, the two components are LipQ and 
LipR and can regulate the expression of a lipase in Pseudomonas alcaligenes as 
35 defined herein. Although other two-component regulatory systems are known, a low 
degree of homology exists between individual pieces of those systems and the 
amin acid sequ nee shown in SEQ ID NOS: 2 and 4. 
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Embodiments of the invention include a kinas or a DNA binding regulator 
encoded by a nucleic acid having at least 50% homology with the DNA sequences 
shown in SEQ ID NOS: 1 r 3, respectively. Pref rably, the homology is at least 
70%, more preferably at least 90% and most preferably at least 95%. 
5 Also provided are embodiments in which a secretion factor encoded by a 

nucleic acid having at least 90% homology with the DNA sequence shown in SEQ 
ID NOS: 6, 8,10, 12, 14, 16, 18, 20, 22, 24, 26, 30, 32, 34. Preferably, the 
homology is at least 95%, more preferably at least 98%. Homology can be 
determined by lining up the claimed amino acid or DNA sequence with another 

10 sequence and determining how many of the amino acids or nucleotides match up as 
a percentage of the total. Homology can also be determined using one of the 
sequence analysis software programs that are commercially available, for example, 
the TFastA Data Searching Program available in the Sequence Analysis Software 
Package Version 6.0 (Genetic Computer Group, University of Wisconsin 

15 Biotechnology Center, Madison, Wisconsin 53705). 

One can screen for homologous sequences using hybridization as described 
herein or using PCR with degenerate primers. Chen and Suttle (1995) 
Biotechniques 18(4):609-610, 612. 

Also, in several embodiments of the invention, there are provided nucleic 

20 acids that can hybridize with the DNA shown in Figures 1A-1B, 2A-2B, 3AA-3BB 
and 9, SEQ ID NOS: 1, 3, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26. 30, 32, 34, 36, 
respectively, under stringent conditions. Stringent hybridization conditions include 
stringent hybridization and washing conditions as is known to one of ordinary skill in 
the art. Hybridization and appropriate stringent conditions are described in 

25 Sambrook et al. 1989 Molecular Cloning 2d ed., Cold Spring Harbor Laboratory 
Press, New York. 

"Bacteria" include microorganisms of the class Schizomycetes. Bacteria can 
be either Gram-negative or Gram-positive. Gram-negative bacteria include 
members of the genera Escherichia, Hemophilus, Klebsiella, Proteus, 

30 Pseudomonas, Salmonella, Shigella, Vibrio, Acinetobacter, and Serratia. Gram- 
positive bacteria include members of the genera Bacillus, Clostridium, 
Staphylococcus, Streptomyces, Lactobacillus and Lactococcus. 

Gram-negative bacteria can be pseudomonads which are strains that are 
members of the genus Pseudomonas. Examples include Pseudomonas 

35 aeruginosa, Pseudomonas cepacia, Pseudomonas glumae, Pseudomonas stutzeri, 
Pseudomonas tragi, Pseudomonas alcaligenes and Pseudomonas mendocina. A 
preferred pseudom nad is Pseudomonas alcaligenes. Pseudomonas alcaligenes is 
also sometimes referred to as Pseudomonas pseudoalcaligenes. 
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Lipases within the scope of th present invention include thos encoded by 
LipA, which is generally found in close associati n with a modulating gene known as 
LipB, LipH, LipX or Lif. Uf from Pseudomonas aicaligenes is th subject f patent 
application WO 93/02617 as discussed above. LipA genes can be found in a 

5 variety of species of bacteria such as Pseudomonas aeruginosa, Pseudomonas 
stutzeri, Pseudomonas aicaligenes, Pseudomonas cepacia, Pseudomonas glumae, 
Pseudomonas tragi t Pseudomonas mendodna, Acinetobacter calcaoceticus and 
Serrate marcescans. 

Another embodiment of the invention provides an expression system that 

10 can regulate the expression of a protein, preferably a lipase. The expression 
system includes a kinase, a DNA binding regulator, a polymerase, a sigma 54 
promoter and an upstream activating sequence. The expression system can also 
include secretion factors.. 

An expression system includes one or more proteins and/or nucleic acids 

15 which, when acting together, can increase the expression of a protein in a host cell. 
The expression system can be encoded on one or more plasmids and may or may 
not be on the same plasmid as the gene encoding the protein of interest 

The phrase "functionally linked" or "functionally coupled" means that the 
regulating elements (DNA or protein) interact physically in order to exert their 

20 function. This can be a protein/protein, DNA/protein or a DNA/DNA interaction. For 
example, the DNA binding regulator interacts with the prompter but genes encoding 
them may be at different sites on the chromosome. As such, the genes encoding 
the elements can be on different plasmids from each other and from the gene 
encoding the protein of interest and still work together to regulate expression of the 

25 protein. 

A plasmid is a nucleic acid molecule which is smaller than the chromosome 
and can replicate independently of the mechanisms used for chromosomal 
replication. Typically, a plasmid is a circular DNA molecule. Plasmids can be 
inserted into host cells where they can replicate and make more copies of the 

30 plasmid; hence, replicating plasmid. Some plasmids, called integrating plasmids, 
can insert the plasmid DNA into the chromosome of the host cell. The plasmid DNA 
is thus integrated into the chromosome of the host cell. When this happens, the 
plasmid no longer replicates autonomously but instead replicates in synchrony with 
the chromosome into which it has been inserted. Thus, whereas a non integrated 

35 plasmid may be present at several dozen copies per chromosome and replicate 
independently of the chromosome, the integrated plasmid is present at one copy 
per chrom som and can replicat nly when th chromosome does so. 
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On mbodiment f th invent! n is direct d to a method f transf rming a 
host cell with a plasmid that includes th nucleic acid encoding the expression 
system. A host cell is a cell into which a plasmid of the present invention can be 
inserted through, for example, transformation. The host cell is preferably a bacteria. 
5 In one embodiment, the host cell is preferably a Gram-negative bacteria. In another 
preferred embodiment the host cell is a pseudomonad. Preferably, the host cell is 
Pseudomonas alcaligenes and the regulating elements of the expression system 
are from Pseudomonas alcaligenes. The same host cell can be transformed with a 
further plasmid that includes a nucleic acid that encodes one or more secretion 

10 factors. Preferably, the secretion factors are from Pseudomonas alcaligenes. 

A transformed host cell is a host cell into which one or more plasmids have 
been inserted. Transformation can take place by first making the host cell 
competent to receive the plasmid. The naked DNA is then added directly to the 
cells and some of the cells take it up and replicate or integrate it One way of 

15 making the cells competent to receive the plasmid is by electroporation as described 
in the Examples below. Another method that is useful for construction and 
transferring of cosmid libraries is triparental mating. Kelly-Wintenberg and Montie 
(1989) J. Bacterid. 171(1 1):6357-62. 

Lipases produced according to the present invention can be used in a 

20 number of applications. Lipases can be used in detergents and other cleaning 
formulations as well as a number of industrial processes. 
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Experimental 
Materials and Methods 
Bacterial Strains 

All bacterial strains were propagated with 2xTY as a liquid or solid medium, 
5 unless otherwise stated, and are listed in Table 1. For P. alcaligenes strains, the 
medium was supplemented with the appropriate antibiotics: neomycin (10mg/I), 
tetracycline (5 mg/l) and chloramphenicol (3 mg/l); and for transformed Escherichia 
coil, ampicillin was added at 100 mg/l. For cosmid containing Escherichia coli 
strains, the medium was supplemented with tetracycline (10 mg/l). P. alcaligenes 
10 and E. coH were grown at 37°C, aerobically. 



Table 1. Bacterial strains used. Tet R , tetracycline resistant; Neo R , neomycin 
resistant; Cap R , chloramphenicol resistant; lip, lipase. 



Strain 


Relevant 
Characteristics 


Strain 


Relevant 
Characteristics 


P. alcaligenes: 




P. alcaligenes: 




Ps #1 


Cosmid #1 in Ps 824, 
Tet R , lip' 


Lip34 


Neo R , //p + 


Ps #26 


Cosmid #26 in Ps 
824, Tet R , lip- 


Ps537 


lip* (cured from 
production plasm id 
p24lipo1) 


Ps#27 


Cosmid #27 in Ps 
824, Tet R lip' 


Ps824 


lip" (Lip34 cured 
from production 
plasmid p24lipo1) 


Ps #57 


Cosmid #57 in Ps 
824, Tet R tip' 


Ps1084 


2 copies lipQ-R, 
//p + , Neo R , Cap R 


Ps#71 


Cosmid #71 in Ps 
824, Tet R , lip' 


Ps93 


res", mod* 


Ps#91 


Cosmid #91 in Ps 
824, Tet R lip' 


Ps1108 


Ps93 containing 
inactivation of LipR 
in chromosome 


Ps#131 


Cosmid #131 in Ps 
824, Tet R lip' 






Ps #201 


Cosmid #201 in Ps 
824, Tet R lip' 


E.coliK12: 




Ps #344 


Cosmid #344 in Ps 
824, Tet R lip' 


K802 


hsdR\ hsdM* , gar , 
met , supE 


Ps#371 


Cosmid #371 in Ps 
824, Tet R lip' 


WK6 


b{lac-pro AB) t ga/E, 
StrA/Z\ lacfl, 
2Am15, proA + S + 


Strain 


R levant 
Characteristics 


Strain 


R levant 
Characteristics 
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Ps #399 


Cosmid #399 in Ps 
824. Tot* Gp- 







Ps#401 


Cosmid #401 in Ps 
824, Tet R . lip- 






Ps#404 


Cosmid #404 in Ps 
824, TetR. lip- 






Ps #490 


Cosmid #490 in Ps 
824, TetR. lip- 






Ps#505 


Cosmid #505 in Ps 
824. Tet R . tip' 






Ps #540 


Cosmid #540 in Ps 
824. Tet R . lip- 






Ps#597 


Cosmid #597 in Ps 
824. Tet R . rip' 






Ps#600 


Cosmid #600 in Ps 
824. Tet R , lip' 






Ps #638 


Cosmid #638 in Ps 
824. Tet R , lip- 






Ps#726 


Cosmid #726 in Ps 
824, Tet R . lip- 







Table 2. Plasmids used. 



Ptasmid 


Relevant Characteristics 


Reference 


pLAFR3 


Cosmid vector derived from 
pLAFR1,Tet R 


Staskawics et al. 1987 


p24Upo1 


lip+, neoR 


equivalent to p24A28 
(see WO94/02617) 


pUC19 


lacZ\ rop" 


Yanisch-Perron et al. 1985 



5 Extraction of Extra-Chromosomal DNA 

Cosmid and plasmid isolations were performed using the QIAprep Spin 
Plasmid kit, for 1 ml overnight culture, and the QIAfilter Plasmid Midi Kit, for 100 ml 
culture isolations (both Qiagen), according to the manufacturers instructions. For 
Pseudomonas strains, lysozyme (10 pt/ml) was added to the resuspension mix and 
10 incubated for 5 minutes at 37°C to aid cell lysis. Cosmid DNA was eluted from the 
QIAprep columns with 70°C milliQ water, as recommended by the manufacturer. 
For cosmid isolations from 100 ml cultures, strains were grown overnight in Luria 
Bertani (LB) broth and the elution buffer was heated to 50°C. 



15 Transformation of Pseudomonas alcaligenes 
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An vemight culture of P alcaligenes was diluted 1 : 1 00 in fresh 2xTY 
medium (with 10 mg/l neomycin) and th culture incubat d at 37°C, in an rbital 
shaker, until it had reached an OD 550 f 0.6-0.8. Following centrifugation (10 
minutes at 4000 rpm), the bacterial pellet was washed twice with a half volume SPM 
5 medium (276 mM sucrose; 7 mM NaHP0 4 (pH 7.4); 1 mM MgC^)- The cells were 

then resuspended in a 1/100 volume SPM medium. Cosmid DNA and 40 pi cells 
were mixed together and transferred to a 2 mm gap electroporation cuvette (BTX). 
The cells were electroporated with 1 .4 kV, 25 jiF, 200Q, in the Gene Puftser. The 
electroporation cuvette was washed out with 1 ml 2xTY medium and the cell mixture 
10 transferred to a clean 1 .5 ml eppendorf . The transformation mixture was then 
incubated for 45 minutes at 37°C. After incubation, 1 00 \d was plated onto 2xTY 
agar supplemented with tetracycline (5 mg/l) or neomycin (10 mg/l) or both 
(depending on which P. alcaligenes strain is used for electroporation). The 
transformation of P. alcaligenes cells was carried out at room temperature. 

15 

Transformation of Escherichia coli 

Transformation of E.coli Wk6 cells were performed using electroporation. 
Transfer of the cosmids to E.coli K802 cells was performed by infection according to 
the suppliers instructions (Promega Corporation). 

20 

Example 1 
Construction of a Cosmid Library from 
Pseudomonas alcaligenes DNA in E. coli 
Chromosomal DNA extracted from P. alcaligenes was fractionated and 
25 ligated into cosmid pLAFR3 as described in the Materials and Methods section, 
above. After ligation, the mixture was transferred into £. coli as described. 
Tetracycline resistant colonies were isolated and cosmid DNA was prepared from 
each of them. 
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Exampl 2 

Transformation of a P. alcaligenes Cosmid Library into 
P. alcaligenes Overexpressinq Lipase 

In total, 531 plasmid DNA preparations were isolated from E. coli grown 
5 cosmids. With the aid of electroporation (see Methods, above) these were 

transformed into strain Lip34, a P. alcaligenes strain harboring plasmid p24Lipo1 
expressing lipase, resulting in 485 cosmid containing P. alcaligenes strains. For 
transformation, methods as described were used. 

10 Example 3 

Selection of Cosmids Stimulating Lipase Expression 

In total, 485 cosmids were transformed, followed by screening of cosmid- 
containing P. alcaligenes strains with respect to their lipase production activity. 
Twenty cosmid strains were selected which showed a significant enhancement of 

15 lipase expression as judged from various liquid and plate tests (see Table 3). The 
corresponding cosmids were also tested in a single copy lipase strain and some of 
them were found to give a threefold increase in lipase expression. The four best 
cosmids were found to share an overlapping fragment of 5.6 kb. The lipase 
stimulating activity was localized on a 4.5 kb fragment of cosmid #71, #201, #505, 

20 #726. Sequence analysis of this fragment revealed two open reading frames which 
showed homology with two component regulatory systems, (see Figures 4A-4G). 
We have named the genes //pQ, lipR and orfZ. It should be noted that from the four 
described cosmid-strains, only strains containing cosmids #71, 505 and 726, which 
has the completed OrfZ , give the highest lipase stimulation in the lactate test 

25 (second column in table 3) in comparison to the strain containing cosmid #201. 



Table 3. 



Cosmid # 


Medium 380 + Soy Oil 


380 + 
Lactate 


2xTY+hexadecane 


1 


35.25 


19.00 


13.00 


26 


35.25 


14.75 


9.00 


27 


26.50 


18.25 


10.00 


57 


35.75 


9.25 


7.50 


71 


40.25 


27.25 


16.67 


91 


22.75 


23.00 


18.00 


131 


41.30 


11.00 


3.00 


201 


39.00 


18.00 


10.00 


344 


32.50 


11.00 


8.30 


371 


25.50 


13.75 


15.00 
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Cosmid# 


Medium 380 + Soy Oil 


380 + 
Lactate 


2xTY+h xadecane 


399 


23 00 


27.00 


9.00 


401 


26.25 


11.75 


3.00 


404 


23.75 


21.00 


7.00 


490 


27.00 


13.25 


16.00 


505 


63.50 


28.75 


15.00 


540 


50.50 


17.75 


4.25 


597 


47.00 


25.25 


25.25 


600 


32.00 


17.00 


19.00 


638 


34.75 


8.25 


11.00 


726 


36.75 


25.25 


21.00 


control 


20.80 


11.50 


11.50 



Example 4 

Evidence for Involvement of LipQ/LtpR in Lipase Expression 
5 In order to assess the role of the lipQ/BpR operon, an inserttonal inactivation 

of the LipR ORF was constructed in the chromosome of strain PS93. The resulting 
mutant, Ps1108 showed a significantly reduced haio on tributyrin agar plates as 
compared to PS93 

In a second experiment the lipase expression plasmid, p24lipo1 was 
10 introduced into strain Ps1 108. The lipase expression was severely impaired as 
compared to PS93 harboring p24lipo1. 

This observation suggests the lipQ/lipR operon as the lipase regulatory 
proteins. 

15 Example 5 

Construction and Characterization of a LipQ/LipR 
Overexpressing P. alcaliaenes Strain 

The 4.5 kb EcoRI-H/ndlll fragment of one of the four lipase stimulating 
cosmids (#201) was subctoned onto pLAFR3 and inserted into a P. alcaligenes 

20 strain with a single lipase gene on the chromosome (Ps537). A threefold higher 
yield of lipase after a 10 liter fermentation was observed. (See Figure 5.) 

Subsequently, the 4.5 EcoRI-H/ndlll fragment was inserted onto the lipase 
expression plasmid p24lipo1. A higher lipase expression was observed as could be 
concluded from halo size on tributyrin plates. During growth in a shake flask, 

25 plasmid instability was observed. In order to overcome this instability, the fragment 
was also integrated into th chromosom resulting in a strain with 2 HpQAipR gene 
copi s into the chromosome (strain Ps1084). Insertion f the lipase xpression 
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plasmid p24Lipo1 in this strain resulted in higher lipase xpression n th plate, but 
a plasmid instability during fermentation. 

Example 6 

5 Effect of Cosmid #600 on Production 

Plasmid Stability in Ps1084 
Previously, a P. atcaligenes strain had been developed in which a second 
copy of iipQ-R had been integrated into the chromosome. When a lipase 
production plasmid (plasmid p24Upo1) was introduced at high copy number (20) 

10 into Ps1084 and the strain fermented (10 liters), plasmid instability was observed. A 
shake-flask experiment was developed to model the situation in the fermenter. To 
monitor production plasmid stability and cosmid stability of transformed Ps1084, a 
week long shake-flask experiment was set up. After overnight growth in 10 ml 2xTY 
broth (supplemented with the required amount of neomycin and tetracycline), 1 ml 

15 of transformed culture was used to inoculate 100 ml fermentation medium 380 plus 
200 }il soy oil, in shake-flasks. The inoculated shake flasks were incubated for 24 
hours at 37°C in an orbital shaker. One ml of 24 hour old culture was then used to 
inoculate successive shake-flasks. Throughout the duration of the experiment, 
daily samples were taken. The presence of a neomycin marker on the lipase 

20 production plasmid was used to monitor plasmid stability. The integrated tipQ-R 
strain with the high copy lipase production plasmid (Ps1084) was transformed with 
cosmid #600 to see whether plasmid stability was improved. 

Figure 6 is a graphical representation of production plasmid stability in the 
transformed and untransformed Ps1084 (in duplicate). After 3-4 days, plasmid 

25 instability was detected in Ps1084, observed as the 80% drop in neomycin resistant 
colonies. Through out the week long experiment, cosmid #600 transformed Ps1084 
maintained a high degree of neomycin resistance, suggesting that cosmid #600 
stabilized the production plasmid. 

30 Example 7 

Characterization of Cosmid #600 
Cosmid #600, gave a positive signal when PCR was carried out using xcpR 
primers based on peptides from xcpR derived from Pseudomonas aeruginosa. The 
DNA sequence from cosmid #600 was digested with EcoRV and the resulting 
35 fragment mixture and purified fragments were ligated with Smal-digested-pUC19 
(Appligene) using the Rapid DNA Ligation kit (Boehringer Mannheim). £. coli cells 
were then electro porated. Transformants were selected on 2xTY plates containing 
ampicillin (100 mg/l), X-Gal (Boehringer Mannheim; 40 mg/l) and IPTG (Gibco BRL; 
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1 mM). Transformants containing the recombinant plasmid were identified as white 
colon! s and single colonies were streak d on to fresh 2xTY agar plat s(with 
ampicillin) for purity. 

Sequencing of PCR products, cosmid #600 DNA and subclones of cosmid 
5 #600 (see above) was achieved by the Dye deoxy termination method, using the 
ABI PRISM 111 Dye Termination Cycle Sequencing Ready Reaction kit with 
AmpliTaq® DNA Polymerase, FS (Perkin Elmer) in conjunction with the Applied 
Biosystems 373A sequencer. 

Sequencing of cosmid #600 was initiated with the primers used in the PCR 
10 to detect xcpR In accordance with the restriction map of cosmid #600 (Figure 8), 
an EcoRV restriction site was identified in the nucleic acid sequence of the PCR 
product Sequence analysis revealed that the 609 bp amplification product could be 
translated to a putative amino acid sequence with 89% homology with P 
aeruginosa and 73% with P. putkta XcpR protein (amino acid residues 59-262), 
15 verifying that the xcpR gene had been identified by PCR. 

Figures 8 show the map of cosmid #600. By doing a PCR reaction with 
digested DNA, we were able to deduce the location of xcpR on the insert The 
position of the xcpR gene suggests that the complete Xcp operon is present in 
cosmid #600. 

20 To date 17.612 nucleotides, encompassing xcpP, xcpQ, orfV, orfX, xcpR, 

xcpS, xcpT, xcpU, xcpV, xcptV, xcpX, xcpY, xcpZ and part of protein OrfY have 
been sequenced (Figures 3AA-3BB, SEQ ID NO: 29). 

While the invention has been described in connection with specific 
embodiments thereof, it will be understood that it is capable of further modifications 

25 and this application is intended to cover any variations or adaptations of the 

invention following, in general, the principles of the invention and including such 
departures from the present disclosure as come within known or customary practice 
within the art to which the invention pertains and as may be applied to the essential 
features hereinbefore set forth, and as follows in the scope of the appended claims. 



30 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION 

(i) APPLICANT: Gerritse, Gijsbert 
Quax, Wilhelmus J. 

(ii) TITLE OF THE INVENTION: EXPRESSION SYSTEM FOR ALTERED 

EXPRESSION LEVELS 

(iii) NUMBER OF SEQUENCES: 37 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Genencor International 

(B) STREET: 925 Page Mill Road 

(C) CITY: Palo Alto 

(D) STATE: CA 

(E) COUNTRY: USA 

(F) ZIP: 94304-1013 

<v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Diskette 

(B) COMPUTER: IBM Compatible 

(C) OPERATING SYSTEM: DOS 

(D) SOFTWARE: FastSEQ for Windows Version 2*0 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 08/699,092 

(B) FILING DATE: 16-AUG-1996 

(viii) ATTORNEY /AGENT INFORMATION : 

(A) NAME: Glaioter, Debra J 

(B) REGISTRATION NUMBER: 33 f 888 

(C) REFERENCE / DOCKET NUMBER: GC361-2 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 650-846-7620 

(B) TELEFAX: 650-845-6504 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1029 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNBSS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 

ATGGGCGTAT GTTCGCTGGC CAAGGACCAG GAAGTGCTGA TGTGGAACCG CGCCATGGAG 60 

GAACTCACCG GCATCAGCGC GCAGCAGGTG GTCGGCTCGC GCCTGCTCAG CCTGGAGCAC 120 

CCCTGGCGCG AGCTGCTGCA GGACTTCATC GCCCAGGACG AGGAGCACCT GCACAAGCAG 180 

CACCTGCAAC TGGACGGCGA GGTGCGCTGG CTCAACCTGC ACAAGGCGGC CATCGACGAA 240 

CCGCTGGCGC CGGGCAACAG CGGCCTGGTG CTGCTGGTCG AGGACGTCAC CGAGACCCGC 300 
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GTGCTGGAAG ACCAGCTGGT GCACTCCGAG CGTCTGGCCA GCATCGGCCG CCTGGCCGCC 360 

GGGGTGGCCC ACGAGATCGG CAATCCGGTC ACCGGCATCG CCTGCCTGGC GCAGAACCTG 420 

CGCGAGGAGC GCGAGGGCGA CGAGGAGCTC GGCGA6ATCA GCAACCAGAT CCTCGACCAG 480 

ACCAAGCGCA TCTCCCGCAT C6TCCAGTCG CTGATGAACT TCGCCCACGC CGGCCAGCAG 540 

CAGCGCGCCG AATACCCGGT GA6CCTGGCC GAAGTGGCGC AGGACGCCAT CGGCCTGCTG 600 

TCGCTGAACC GCCATCGCAC CGAAGTGCAG TTCTACAACC TGTGCGATCC CGAGCACCTG 660 

GCCAAGGGCG ACCCGCAGCG CCTGGCCCAG GTGCTGATCA ACCTGCTGTC CAACGCCCGC 720 

GATGCCTCGC CGGCCGGCGG TGCCATCCGC GTGCGTAGCG AGGCCGAGGA GCAGAGCGTG 780 

GTGCTGATCG TCGAGGACGA GGGCACGGGC ATTCCGCAGG CGATCATGGA CCGCCTGTTC 840 

GAACCCTTCT TCACCACCAA GGACCCCGGC AAGGGCACCG GTTT G GGGCT CGCGCTGGTC 900 

TATTCGATCG TGGAAGAGCA TTATGGGCAG ATCACCATCG ACAGCCCGGC CGATCCCGAG 960 

CACGAGCGCG GAACCCGTTT CCGCGTGACC CTGCCGCGCT ATGTCGAAGC GACGTCCACA 1020 

GOGACCTGA 1029 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 342 amino acids 

(B) TYPE: amino acid 

(C) STRANDED NESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 



Met Gly Val 


Cys 


Ser 


Leu 


Ala 


Lys 


Asp 


Gin 


Glu 


Val 


Leu 


Met 


Trp Asn 


1 




5 










10 










15 




Arg Ala Met 


Glu 


Glu 


Leu 


Thr 


Gly 


He 


Ser 


Ala 


Gin 


Gin 


Val 


Val 


Gly 


20 










25 










30 






Ser Arg Leu 


Leu 


Ser 


Leu 


Glu 


His 


Pro 


Trp Arg 


Glu 


Leu 


Leu 


Gin 


Asp 


35 










40 










45 








Phe lie Ala 


Gin 


Asp 


Glu 


Glu 


His 


Leu 


His Lys 


Gin 


His 


Leu 


Gin 


Leu 


50 








55 










60 










Asp Gly Glu 


Val 


Arg 


Trp 


Leu 


Asn 


Leu 


His 


Lys 


Ala 


Ala 


He 


Asp 


Glu 


65 






70 










75 










80 


Pro Leu Ala 


Pro 


Gly 


Asn 


Ser 


Gly 


Leu 


Val 


Leu 


Leu 


Val 


Glu 


Asp 


Val 






85 










90 










95 




Thr Glu Thr 


Arg 


Val 


Leu 


Glu 


Asp 


Gin 


Leu 


Val 


His 


Ser 


Glu 


Arg 


Leu 




100 










105 










110 






Ala ser lie 


Gly 


Arg 


Leu 


Ala 


Ala Gly 


Val 


Ala 


His 


Glu 


He Gly 


Asn 


115 










120 










125 








Pro Val Thr 


Gly 


He 


Ala 


Cys 


Leu 


Ala 


Gin 


Asn 


Leu 


Arg 


Glu 


Glu 


Arg 


130 








135 










140 










Glu Gly Asp 


Glu 


Glu 


Leu 


Gly 


Glu 


He 


Ser 


Asn 


Gin 


He 


Leu 


Asp 


Gin 


145 






150 










155 










160 


Thr Lys Arg 


He 


Ser 


Arg 


He 


Val 


Gin 


Ser 


Leu 


Met 


Asn 


Phe 


Ala 


His 




165 










170 










175 




Ala Gly Gin 


Gin 


Gin 


Arg 


Ala 


Glu 


Tyr 


Pro 


Val 


Ser 


Leu 


Ala 


Glu 


Val 


180 










185 










190 






Ala Gin Asp 


Ala 


He 


Gly 


Leu 


Leu 


Ser 


Leu 


Asn 


Arg His 


Gly 


Thr 


Glu 


195 










200 










205 








Val Gin Phe 


Tyr 


Asn 


Leu 


Cys 


Asp 


Pro 


Glu 


His 


Leu 


Ala 


Lys 


Gly 


Asp 


210 








215 










220 










Pro Gin Arg 


Leu 


Ala 


Gin 


Val 


Leu 


He 


Asn 


Leu 


Leu 


Ser 


Asn 


Ala 


Arg 


225 






230 










235 










240 


Asp Ala Ser 


Pro 


Ala 


Gly 


Gly 


Ala 


He 


Arg 


Val 


Arg 


Ser 


Glu 


Ala 


Glu 




245 










250 










255 




Glu Gin Ser 


Val 


Val 


Leu 


He 


Val 


Glu 


Asp Glu 


Gly Thr Gly 


He 


Pro 




260 










265 










270 
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Gin 


Ala 


He 
275 


Met 


Asp 


Arg 


Leu Ph 
280 


Glu 


Pro 


Phe 


Phe 


Thr Thr 
285 


Lys 


Asp 


Pro 


Glv 
290 




Glv 


Thr 


Gly 


Leu Gly 
295 


Leu 


Ala 


Leu 


Val 
300 


Tyr Ser 


He 


Val 


Glu 


Glu 


His 


Tyr 


Gly 


Gin 


He Thr 


He 


Asp 


Ser 


Pro 


Ala Asp 


Pro 


Glu 


305 










310 








315 








320 


His 


Gin 


Arg 


Gly 


Thr 


Arg 


Phe Arg 


Val 


Thr 


Leu 


Pro Arg Tyr Val 


Glu 










325 








330 








335 




Ala 


Thr 


Ser 


Thr 
340 


Ala 


Thr 



















(2) INFORMATION FOR SEQ ID NO: 3 ; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1416 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

ATGCCGCATA TCCTCATCGT CGAAGACGAA ACCATCATCC GCTCCGCCCT GCGCCGCCTG 60 

CTGGAACGCA ACCAGTACCA GGTCAGCGAG GCCGGTTCGG TTCAGGAGGC CCAGGAGCGC 120 

TACAGCATTC GGACCTTCGA CCTGGTGGTC AGCGACCTGC GCCTGCCCGG CGCCCCCGGC 180 

ACCGAGCTGA TCAAGCTGGC CGACGGCACC CCGGTACTGA TCATGACCAG CTATGCCAGC 240 

CTGCGCTCGG CGGTGGACTC GATGAAGATG GGCGCGGTGG ACTACATCGC CAAGCCCTTC 300 

GATCACGACG AGATGCTCCA GGCCGTGGCG CGTATCCTGC GCGATCACCA GGAGGCCAAG 360 

CGCAACCCGC CAAGCGAGGC GCCCAGCAAG TCCGCCGGCA AGGGCAACGG CGCCACCGCC 420 

GAGGGCGAGA TCGGCATCAT CGGCTCCTGC GCCGCCATGC AGGACCTTTA CGGCAAGATC 480 

CGCAAGGTCG CTCCCACCGA TTCCAACGTA CTGATCCAGG GCGAGTCCGG CACCGGCAAG 540 

GAGCTGGTCG CGCGTGCGCT GCACAACCTC TCGCGTCGCG CCAAGGCACC GCTGATCTCG 600 

GTGAACTGCG CGGCCATCCC CGAGACCCTG ATCGAGTCCG AACTGTTCGG CCACGAGAAA 660 

GGTGCCTTCA CCGGCGCCAG CGCOGGCCGC GCCGGCCTGG TCGAAGCGGC CGACGGCGGC 720 

ACCCTGTTCC TCGACGAGAT CGGCGAGCTG CCGCTGGAGG CGCAGGCCCG CCTGCTGCGC 780 

GTGCTGCAGG AGGGCGAGAT CCGTCGGGTC GGCTCGGTGC AGTCACAGAA GGTCGATGTA 840 

CGCCTGATCG CCGCTACCCA CCGCGACCTC AAGACGCTGG CCAAGACCGG CCAGTTCCGC 900 

GAGGACCTCT ACTACCGCCT GCACGTCATC GCCCTCAAGC TGCCGCCACT GCGCGAGCGC 960 

GGCGCCGACG TCAACGAGAT CGCCCGCGCC TTCCTCGTCC GCCAGTGCCA GCGCATGGGC 1020 

CGCGAGGACC TGCGCTTCGC TCAGGATGCC GAGCAGGCGA TCCGCCACTA CCCCTGGCCG 1080 

GGCAACGTGC GCGAGCTGGA GAATGCCATC GAGCGCGCGG TGATCCTCTG CGAGGGCGCG 1140 

GAAATTTCCG CCGAGCTGCT GGGCATCGAC ATCGAGCTGG ACGACCTGGA GGACGGCGAC 1200 

TTCGGCGAAC AGCCACAGCA GACCGCGGCC AACCACGAAC CGACCGAGGA CCTGTCGCTG 1260 

GAGGACTACT TCCAGCACTT CGTACTGGAG CACCAGGATC ACATGACCGA GACCGAACTG 1320 

GCGCGCAAGC TCGGCATCAG CCGCAAGTGC CTGTGGGAGC GCCGTCAGCG CCTGGGCATT 1380 

CCGCGGCGCA AGTCGGGCGC GGCGACCGGC TCCTGA 1416 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 471 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
Met Pro His He Leu He Val Glu Asp Glu Thr He He Arg Ser Ala 
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1 




c 

3 
















1 "5 

49 




Leu Arg 


Arg 


Leu Leu Glu 


Arg 


Asn Gin 


Tyr 


Gin 


Val 


C A V 


fin 

vlU 


Ala 


i*iy 










A3 










in 

JU 






Ser Val 


Gin 


Glu Ala Gin 


Glu 


Arg Tyr 


Ser 


He 


Pro 


Tnr 


Pne 


Asp 


Leu 




35 






4U 








Ati 
4b 








Val Val 


Ser 


Asp Leu Arg 


Leu 


Pro Gly 


Ala 


Pro Gly 


Thr 


Glu 


Leu 


He 


50 






55 








60 










Lye Leu 


Ala 


Asp Gly Thr 


Pro 


Val Leu 


He 


Met 


Thr 


Ser 


Tyr 


Ala 


Ser 


65 




70 








75 










OA 

ou 


Leu Arg 


Ser 


Ala Val Asp 


Ser 


Met Lys 


Met 


Gly Ala 


vai 


Asp 


Tyr 


lie 






85 
















7b 




Ala Lye 


Pro 


Phe Asp His 


Asp 


Glu Met 


Leu 


Gin 


Ala 


Val 


Ala 


Arg 


He 






100 




lOb 










110 






Leu Arg 


Asp 


His Gin Glu 


Ala 


Lys Arg 


Asn 


Pro 


Pro 


Ser 


Glu 


Ala 


Pro 




115 






120 








* o c 
125 








Ser Lye 


Ser 


Ala Gly Lys Gly 


Asn Gly 


Ala 


Thr 


Ala 


Glu 


Gly 


Glu 


He 


130 






135 








140 










Gly lie 


lie 


Gly Ser Cys 


Ala 


Ala Met 


Gin 


Asp Leu 


Tyr 


Gly 


Lys 


lie 


145 




150 








155 










160 


Arg Lys 


Val 


Ala Pro Thr 


Asp 


Ser Asn 


Val 


Leu 


He 


Gin 


Gly 


Glu 


Ser 






165 






170 










175 




Gly Thr 


Gly 


Lys Glu Leu 


Val 


Ala Arg 


Ala 


Leu 


His 


Asn 


Leu 


Ser 


Arg 






180 




185 










190 






Arg Ala 


Lys 


Ala Pro Leu 


He 


Ser Val 


Asn 


Cys Ala 


Ala 


He 


Pro 


Glu 




195 






200 








205 








Thr Leu 


lie 


Glu Ser Glu 


Leu 


Phe Gly 


His 


Glu 


Lys 


Gly 


Ala 


Phe 


Thr 


210 






215 








220 










Gly Ala 


Ser 


Ala Gly Arg 


Ala 


Gly Leu 


Val 


Glu 


Ala 


Ala 


Asp 


Gly 


Gly 


225 




230 








235 










24U 


Thr Leu 


Phe 


Leu Asp Glu 


He 


Gly Glu 


Leu 


Pro 


Leu 


Glu 


Ala 


Gin 


Ala 






245 






250 










255 




Arg Leu 


Leu 


Arg Val Leu 


Gin 


Glu Gly 


Glu 


He 


Arg 


Arg 


Val 


Gly 


Ser 






260 




265 










270 






Val Gin 


Ser 


Gin Lys Val 


Asp 


Val Arg 


Leu 


He 


Ala 


Ala 


Thr 


His 


Arg 




275 






280 








285 








Asp Leu 


Lys 


Thr Leu Ala 


Lys 


Thr Gly 


Gin 


Phe 


Arg 


Glu 


Asp 


Leu 


Tyr 


290 






295 








300 










Tyr Arg 


Leu 


His Val lie 


Ala 


Leu Lys 


Leu 


Pro 


Pro 


Leu 


Arg 


Glu 


Arg 


305 




310 








315 












Gly Ala 


Asp 


Val Asn Glu 


He 


Ala Arg 


Ala 


Phe 


Leu 


Val 


Arg 


Gin 


Cys 






325 






3 JO 










lie 

J Jb 




Gin Arg 


Met 


Gly Arg Glu Asp 


Leu Arg 


Pne 


Ala 


Gin 


Asp 


Ala 


Glu 


Gin 






340 




345 










350 






Ala lie 


Arg 


His Tyr Pro Trp 


Pro Gly 


Asn 


Val 


Arg 


Glu 


Leu 


Glu 


Asn 




355 






"yen 








Job 








Ala lie 


Glu 


Arg Ala Val 


He 


Leu Cys 


Glu 


Gly Ala 


IrlU 


i ie 


Cor 


Ala 


370 






375 








380 










Glu Leu 


Leu 


Gly He Asp 


He 


Glu Leu 


Asp 


Asp Leu 


Glu 


Asp 


Gly 


Asp 


385 




390 








395 










Ann 


Phe Gly 


Glu 


Gin Pro Gin 


Gin 


inr Aid 


Ala 


Asn 


His 




Pro 


<fhr 
X I1X. 






405 






410 










415 




Asp Leu 


Ser 


Leu Glu Asp 


Tyr 


Phe Gin 


His 


Phe 


Val 


Leu 


Glu 


His 


Gin 






420 




425 










430 






Asp His 


Met 


Thr Glu Thr 


Glu 


Leu Ala 


Arg 


Lys Leu 


Gly 


He 


Ser 


Arg 




435 






440 








445 








Lys Cys 


Leu 


Trp Glu Arg Arg 


Gin Arg 


Leu 


Gly 


He 


Pro 


Arg 


Arg 


Lys 


450 






455 








460 










Ser Gly 


Ala 


Ala Thr Gly 


Ser 



















465 470 
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(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pa ire 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : a ingle 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
GCCTGGAGGA TTACCAGTC 19 



(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1512 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

ATGTCCACCG ATACCCACGC CGCCCTGACG GCTCCCGCAA GCCCCGCCTT GCGCCCGCTG 60 

CCCTTCGCCT TCGCCAAACG CCACGGCGTG CTGCTGCGCG AGCCCTTCGG CCAGGTCCAG 120 

CTGCAGGTGC GCCGCGGTGC CAGCCTGGCC GCCGTGCAGG AGGCCCAGCG CTTCGCCGGC 180 

CGCGTGCTGC CGCTGCACTG GCTGGAGCCC GAGGCCTTCG AGCAGGAGCT GGCCCTGGCC 240 

TACCAGCGCG ACTCCTCCGA GGTGCGGCAG ATGGCCGAGG GCATGGGTGC CGAACTTGAC 300 

CTAGCCAGCC TGGCCGAACT CACTCCCGAA TCCGGCGACC TGCTGGAGCA GGAAGATGAC 360 

GCGCCGATCA TCCGCCTGAT CAACGCCATC CTCAGCGAGG CGATCAAGGC CGGCGCCTCC 420 

GACATCCACC TGGAAACCTT CGAGAAACGC CTGGTGGTGC GCTTTCGCGT CGACGGCATC 480 

CTCCGCGAAG TGATCGAACC GCGCCGCGAG CTGGCGGCGC TGCTGGTCTC GCGGGTCAAG 540 

GTCATGGCGC GCCTGGACAT CGCCGAGAAG CGCGTACCGC AGGACGGCCG TATTTCGCTC 600 

AAGGTCGGCG GTCGCGAGGT GGATATCCGC GTCTCCACCC TGCCGTCGGC CAACGGCGAG 660 

CGGGTGGTGC TGCGTCTGCT CGACAAGCAG GCCGGGCGCC TGTCGCTCAC GCATCTGGGC 720 

ATGAGCGAGC GCGACCGCCG CCTGCTCGAC GACAACCTGC GCAAGCCGCA OGGCATCATC 780 

CTAGTCACCG GCCCCACCGG CTCGGGCAAG ACCACCACCC TGTACGCCGG CCTGGTCACC 840 

CTCAACGACC GCTCGCGCAA TATCCTCACG GTGGAAGACC CGATCGAGTA CTACCTGGAA 900 

GGCATCGGCC AGACCCAGGT CAACCCGCGG GTGGACATGA CCTTCGCCCG CGGCCTGCGC 960 

GCCATCCTGC GCCAGGACCC GGACGTGGTG ATGGTCGGCG AGATCCGCGA CCAGGAGACC 1020 

GCCGACATCG CCGTGCAGGC CTCGCTCACC GGCCACCTGG TGCTCTCCAC CCTGCACACC 1080 

AACAGCGCCG TCGGCGCCGT CACCCGCCTG GTCGACATGG GCGTCGAGCC CTTCCTGCTG 1140 

TCGTCGTCCC TGCTCGGCGT GCTGGCCCAG CGCCTGGTGC GCGTGCTCTG CGTGCACTGC 1200 

CGCGAGGCGC GCCCGGCTGA CGCGGCCGAG TGCGGCCTGC TCGGCCTCGA CCCGCACAGC 1260 

CAGCCCCTGA TCTACCACGC CAAGGGCTGC CCGGAGTGCC ACCAGCAGGG CTACCGCGGC 1320 

CGTACTGGCA TCTACGAGCT GGTGATCTTC GACGACCAGA TGCGCACCCT GGTGCACAAC 1380 

GGOGCCGGTG AGCAGGAGCT GATTCGCCAC GCCCGCAGCC TCGGCCCGAG CATCCGCGAC 1440 

GATGGCCGGC GCAAGGTGCT GGAAGGGGTG ACCAGCCTGG AAGAAGTGTT GCGCGTGACC 1500 

CGGGAAGACT GA 1512 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 503 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:7: 

Met Ser Thr Asp Thr His Ala Ala Leu Thr Ala Pro Ala Ser Pro Ala 

15 10 15 

Leu Arg Pro Leu Pro Phe Ala Phe Ala Lys Arg His Gly Val Leu Leu 

20 25 30 

Arg Glu Pro Phe Gly Gin Val Gin Leu Gin Val Arg Arg Gly Ala Ser 

35 40 45 

Leu Ala Ala Val Gin Glu Ala Gin Arg Phe Ala Gly Arg Val Leu Pro 

50 55 60 

Leu Hie Trp Leu Glu Pro Glu Ala Phe Glu Gin Glu Leu Ala Leu Ala 
65 70 75 80 

Tyr Gin Arg Asp Ser Ser Glu Val Arg Gin Met Ala Glu Gly Met Gly 

85 90 95 

Ala Glu Leu Asp Leu Ala Ser Leu Ala Glu Leu Thr Pro Glu Ser Gly 

100 105 110 

Asp Leu Leu Glu Gin Glu Asp Asp Ala Pro lie lie Arg Leu lie Asn 

115 120 125 

Ala lie Leu Ser Glu Ala lie Lys Ala Gly Ala Ser Asp lie His Leu 

130 135 140 

Glu Thr Phe Glu Lye Arg Leu Val Val Arg Phe Arg Val Asp Gly lie 
145 150 155 160 

Leu Arg Glu Val He Glu Pro Arg Arg Glu Leu Ala Ala Leu Leu Val 

165 170 175 

Ser Arg Val Lys Val Met Ala Arg Leu Asp He Ala Glu Lys Arg Val 

180 185 190 

Pro Gin Asp Gly Arg He Ser Leu Lys Val Gly Gly Arg Glu Val Asp 

195 200 205 

He Arg Val Ser Thr Leu Pro Ser Ala Asn Gly Glu Arg Val Val Leu 

210 215 220 

Arg Leu Leu Asp Lys Gin Ala Gly Arg Leu Ser Leu Thr His Leu Gly 
225 230 235 240 

Met Ser Glu Arg Asp Arg Arg Leu Leu Asp Asp Asn Leu Arg Lys Pro 

245 250 255 

His Gly He He Leu Val Thr Gly Pro Thr Gly Ser Gly Lys Thr Thr 

260 265 270 

Thr Leu Tyr Ala Gly Leu Val Thr Leu Asn Asp Arg Ser Arg Asn He 

275 280 285 

Leu Thr Val Glu Asp Pro He Glu Tyr Tyr Leu Glu Gly He Gly Gin 

290 295 300 

Thr Gin Val Asn Pro Arg Val Asp Met Thr Phe Ala Arg Gly Leu Arg 
305 310 315 320 

Ala He Leu Arg Gin Asp Pro Asp Val Val Met Val Gly Glu He Arg 

325 330 335 

Asp Gin Glu Thr Ala Asp He Ala Val Gin Ala Ser Leu Thr Gly His 

340 345 350 

Leu Val Leu Ser Thr Leu His Thr Asn Ser Ala Val Gly Ala Val Thr 

355 360 365 

Arg Leu Val Asp Met Gly Val Glu Pro Phe Leu Leu Ser Ser Ser Leu 

370 375 380 

Leu Gly Val Leu Ala Gin Arg Leu Val Arg Val Leu Cys Val His Cys 
385 390 395 400 

Arg Glu Ala Arg Pro Ala Asp Ala Ala Glu Cys Gly Leu Leu Gly Leu 

405 410 415 

Asp Pro His Ser Gin Pro Leu He Tyr His Ala Lys Gly Cys Pro Glu 

420 425 430 

Cys His Gin Gin Gly Tyr Arg Gly Arg Thr Gly He Tyr Glu Leu Val 
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435 440 445 

lie Phe Asp Asp Gin Met Arg Thr Leu Val Hie Asn Gly Ala Gly Glu 

450 455 460 

Gin Glu Leu lie Arg His Ala Arg Ser Leu Gly Pro Ser lie Arg Asp 
465 470 475 480 

Asp Gly Arg Arg Lys Val Leu Glu Gly Val Thr Ser Leu Glu Glu Val 

485 490 495 

Leu Arg Val Thr Arg Glu Asp 
500 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 1215 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

ATGGCCGCCT TCGAATACAT CGCCCTGGAT GCCAGGGGCC GCCAGCAGAA GGGCGTGCTG 60 

GAGGGCGACA GCGCCCGCCA GGTGCGCCAG CTGCTGCGCG ACAAACAGTT GTCGCCGCTG 120 

CAGGTCGAGC CGGTACAGCG CAGGGAGCAG GCCGAGGCTG GTGGCTTCAG CCTGCGCCGT 180 

GGCCTGTCGG CGCGCGACCT GGCGCTGGTC ACCCGTCAGC TGGCGACCCT GATCGGCGCC 240 

GCGCTGCCCA TCGAGGAAGC GCTGCGCGCC GCCGCCGCGC AGTCGCGCCA GCCGCGCATC 300 

CAGTCGATGC TGTTGGCGGT GCGCGCCAAG GTGCTCGAGG GCCACAGCCT GGCCAAGGCC 360 

CTGGCCTCCT ACCCGGCGGC CTTCCCCGAG CTGTACCGCG CCACGGTGGC GGCCGGCGAG 420 

CATGCGGGGC ACCTGGCGOC GGTGCTGGAG CAGCTGGCCG ACTACACCGA GCAGCGCCAG 480 

CAGTCGCGGC AGAAGATCCA GATGGCGCTG CTCTACCCGG TGATCCTGAT GCTCGCTTCG 540 

CTGGGCATCG TCGGTTTTCT GCTCGGCTAC GTGGTGCCGG ATGTGGTGCG GGTGTTCGTC 600 

GACTCCGGGC AGACCCTGCC GGCGCTGACC CGCGGGCTGA TTTTCCTCAG CGAGCTGGTC 660 

AAGTCCTGGG GCGCCCTGGC CATCGTCCTG GCGGTGCTCG GCGTGCTCGC CTTTCGCCGC 720 

GCCTTGCGCA GCGAGGATCT GCGCCGGCGC TGGCATGCCT TCCTGCTGCG CGTGCCGCTG 780 

GTCGGTGGGC TGATCGCCGC CACCGAGACG GCACGCTTCG CCTCGACCCT GGCCATCCTG 840 

GTGCGCAGCG GCGTGCCACT GGTGGAGGCG CTGGCCATCG GCGCCGAGGT GGTGTCCAAC 900 

CTGATCATCC GCAGCGACGT GGCCAACGCC ACCCAGCGCG TGCGCGAGGG CGGCAGCCTG 960 

TCGCGCGCGC TGGAAGCCAG CCGGCAGTTT CCGCCGATGA TGCTGCACAT GATCGCCAGC 1020 

GGCGAGCGTT CCGGCGAGCT GGACCAGATG CTGGCGCGCA CGGCGCGCAA CCAGGAAAAC 1080 

GACCTGGCGG CCACCATCGG CCTGCTGGTG GGGCTGTTCG AGCCGTTCAT GCTGGTATTC 1140 

ATGGGCGCGG TGGTGCTGGT GATCGTGCTG GCCATCCTGC TGCCGATTCT TTCTCTGAAC 12 OO 

CAACTGGTGG GTTGA 1215 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 404 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

Met Ala Ala Phe Glu Tyr lie Ala Leu Asp Ala Arg Gly Arg Gin Gin 

15 10 15 

Lys Gly Val Leu Glu Gly Asp Ser Ala Arg Gin Val Arg Gin Leu Leu 

20 25 30 

Arg Asp Lys Gin Leu Ser Pro Leu Gin Val Glu Pro Val Gin Arg Arg 
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35 


















Glu 


Gin 


Ala 


Glu 


Ala 


Gly 


Gly 


Pne 


Ser 


Leu Arg Arg Gly 


Leu Ser Ala 




50 




















Arg 


Asp 


Leu 


Ala 


Leu 


Val 


Thr 


Arg 


Gin 


Leu Ala Thr Leu 


lie Gly Ala 


65 










70 












Ala 


Leu 


Pro 


lie 


Glu 


Glu 


Ala 


Leu 


Arg 


Ala Ala Ala Ala 


Gin Ser Arg 










85 










OA 


yb 


Gin 


Pro 


Arg 


lie 


Gin 


Ser 


Met 


Leu 


Leu 


Ala val Arg Ala 


i»ys val Leu 








100 










1U3 




1 in 


Glu 


Gly 


His 


Ser 


Leu 


Ala 


Lys 


Ala 


Leu 


Ala Ser Tyr Pro 


Aia Ala Pne 






115 










120 








Pro 


Glu 


Leu 


Tyr 


Arg 


Ala 


Thr 


Val 


Ala 


Ala Gly Glu His 


Ala Gly His 




130 










135 






140 




Leu 


Ala 


Pro 


Val 


Leu 


Glu 


Gin 


Leu 


Ala 


Asp Tyr Thr Glu 


Gin Arg Gin 


145 










150 








155 


160 


Gin 


Ser 


Arg 


Gin 


Lys 


lie 


Gin 


Met 


Ala 


Leu Leu Tyr Pro 


Val lie Leu 










165 










170 


175 


Met 


Leu 


Ala 


Ser 


Leu 


Gly 


lie 


Val 


Gly 


Phe Leu Leu Gly 


Tyr Val Val 








180 










185 




iyu 


Pro 


Asp 


Val 


Val 


Arg 


Val 


Phe 


Val 


Asp 


Ser Gly Gin Thr 


Leu Pro Ala 






195 










200 




205 




Leu 


Thr 


Arg 


Gly 


Leu 


lie 


Phe 


Leu 


Ser 


Glu Leu Val Lys 


Ser Trp Gly 




210 










215 






220 




Ala 


Leu 


Ala 


lie 


Val 


Leu 


Ala 


Val 


Leu 


Gly Val Leu Ala 


Phe Arg Arg 


225 










230 








235 


240 


Ala 


Leu 


Arg 


Ser 


Glu 


Asp 


Leu 


Arg 


Arg 


Arg Trp His Ala 


Phe Leu Leu 










245 










250 


25b 


Arg 


Val 


Pro 


Leu 


Val 


Gly 


Gly 


Leu 


lie 


Ala Ala Thr Glu 


Thr Ala Arg 








260 










265 




270 


Phe 


Ala 


Ser 


Thr 


Leu 


Ala 


lie 


Leu 


Val 


Arg Ser Gly Val 


Pro Leu Val 






275 










280 




285 




Glu 


Ala 


Leu 


Ala 


lie 


Gly 


Ala 


Glu 


Val 


Val Ser Asn Leu 


lie lie Arg 




290 










295 






300 




Ser 


Asp 


Val 


Ala 


Asn 


Ala 


Thr 


Gin 


Arg 


Val Arg Glu Gly 


Gly Ser Leu 


305 










310 








315 




Ser 


Arg 


Ala 


Leu 


Glu 


Ala 


Ser 


Arg 


Gin 


Pne Pro Pro Met 


Met i*eu ms 










325 










330 


JJb 


Met 


lie 


Ala 


Ser 


Gly 


Glu 


Arg 


Ser 


Gly 


Glu Leu Asp Gin 


Met Leu Aid 








J4U 














350 


Arg 


Thr 


Ala 


Arg 


Asn 


Gin 


Glu 


Asn 


Asp 


Leu Ala Ala Thr 


lie Gly Leu 






355 










360 




365 




Leu 


Val 


Gly 


Leu 


Phe 


Glu 


Pro 


Phe 


Met 


Leu Val Phe Met 


Gly Ala Val 




370 










375 






380 




Val 


Leu 


Val 


lie 


Val 


Leu 


Ala 


lie 


Leu 


Leu Pro lie Leu 


Ser Leu Asn 


385 










390 








395 


400 


Gin 


Leu 


Val 


Gly 

















(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 423 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
ATGTACAAAC AGAAAGGCTT CACGCTGATC GAAATCATGG TGGTGGTGGT CATCCTCGGC 60 
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ATTCTCGCTG CCCTGGTGGT GCCGCAGGTG ATGGGCCGCC CGGACCAGGC CAAGGTCACC 120 

GCGGCGCAGA AC6ACATCCG CGCCATC6GC GCCGCGCTGG ACATGTACAA GCTGGACAAC 180 

CAGAACTACC CGAGCACCCA GCAGGGCCTG GAGGCCCTGG TGAAGAAACC CACCGGCACG 240 

COGGOGGCGA AGAACTGGAA CGCCGAGGGC TACCTGAAGA AGCTGCCGGT OGACCCCTGG 300 

GGCAACCAGT ACCTGTACCT GTCGCOGGGC ACCCGCGGCA AGATCGACCT GTATTCGCTG 360 

GGCGCCGACG GCCAGGAAGG CGGCGAGGGG ACCGACGCCG ACATCGGCAA CTGGGATCTC 420 

TGA 423 



(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 140 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 



Met Tyr 


Lys 


Gin 


Lys 


Gly 


Phe 


Thr Leu 


He 


Glu 


He Met Val Val Val 


1 






5 








10 




15 


Val lie 


Leu 


Gly 


He 


Leu 


Ala 


Ala Leu 


Val 


Val 


Pro Gin Val Met Gly 






20 








25 






30 


Arg Pro 


Asp 


Gin 


Ala 


Lys 


Val 


Thr Ala 


Ala 


Gin 


Asn Asp He Arg Ala 




35 










40 






45 


He Gly 


Ala 


Ala 


Leu 


Asp 


Met 


Tyr Lys 


Leu 


Asp 


Asn Gin Asn Tyr Pro 


50 










55 








60 


Ser Thr 


Gin 


Gin 


Gly 


Leu 


Glu 


Ala Leu 


Val 


Lys 


Lys Pro Thr Gly Thr 


65 








70 








75 


80 


Pro Ala 


Ala 


Lys 


Asn 


Trp 


Asn 


Ala Glu 


Gly 


Tyr 


Leu Lys Lys Leu Pro 








85 








90 




95 


Val Asp 


Pro 


Trp 


Gly 


Asn 


Gin 


Tyr Leu 


Tyr 


Leu 


Ser Pro Gly Thr Arg 






100 








105 






110 


Gly Lys 


He 


Asp 


Leu 


Tyr 


Ser 


Leu Gly 


Ala 


Asp 


Gly Gin Glu Gly Gly 




115 










120 






125 


Glu Gly 


Thr 


Asp 


Ala 


Asp 


He 


Gly Asn 


Trp 


Asp 


Leu 


130 










135 








140 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 642 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

TTGAGTAGCA CCCGCACCCG CCTGCCCGCC TGGCTGCAGC GCCACGGCGT GACCGGCCTC 60 

TGCCTGCTCG TGGTGCTGCT CATCACCCTC AGCCTGAGCA AGCAGAGCAT CGACTTCCTT 120 

CGCCTGCTGC GCAGCGAGGC CGCGCCACCG CCCGCCCCAG AGAGCATCGC CGAGCGCCAG 180 

CCGCTGTCCA TCCAGCGCCT GCAGCATCTG TTCGGCACGC CCGCGGCCAG GCCGCGCGGC 240 

GACCAGGCCG CCCCCGCCAC CCGGCAGCAG ATGACCCTGC TGGCCAGCTT CGTCAACCCG 300 

GACGCCAAGC GCTCCACGGC GATCATCCAG GTCGCCGGCG ACAAACCCAA GCGCATCGCC 360 

GTGGGCGAAT CGGTCAACGT CAGCACCCGC CTGCAGGCCG TCTATCAGGA CCACGTGGTG 420 

CTCGACCGCG GCGGCGTCGA GGAGAGCCTG CGCTTCCCCG CCGTGCGCCA GCCCTCTCTG 480 

ACGCCGGCCT ACTCGGCGCT GGAGCCCACC GCCAGCCAAC TGGAACAGCT GCAGGACGAA 540 

GACGTCCAGG CCCTGCAGGA GCGCATCCAG ACCCTTCAAC AACGCATGGA AGGCGGCGAC 600 
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ATCCCGCAGC CCGAAATACC GGAAGCCGAA GACAGCCCAT GA 642 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 213 amino acids 

(B) TYPE: amino acid 

(C) STRANDBDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 



Met Ser 


Ser 


Thr 


Arg 


Thr 


Arg 


Leu 


Pro 


Ala Trp 


Leu 


Gin 


Arg 


His 


Gly 


1 






5 










10 








15 




Val Thr 


Gly 


Leu 
20 


Cys 


Leu 


Leu 


Val 


Val 
25 


Leu Leu 


He 


Thr 


Leu 
30 


Ser 


Leu 


Ser Lys 


Gin 


Ser 


He 


Asp 


Phe 


Leu 


Arg 


Leu Leu 


Arg 


Ser 


Glu 


Ala 


Ala 


35 










40 








45 








Pro Pro 


Pro 


Ala 


Pro 


Glu 


Ser 


He 


Ala 


Glu Arg 


Gin 


Pro 


Leu 


Ser 


He 


50 










55 








60 










Gin Arg 


Leu 


Gin 


His 


Leu 


Phe 


Gly 


Thr 


Pro Ala 


Ala 


Arg 


Pro 


Arg 


Gly 


65 








70 








75 










80 


Asp Gin 


Ala 


Ala 


Pro 


Ala 


Thr 


Arg 


Gin 


Gin Met 


Thr 


Leu 


Leu 


Ala 


Ser 






85 










90 








95 




Phe Val 


Asn 


Pro 
100 


Asp 


Ala 


Lys 


Arg 


Ser 
105 


Thr Ala 


He 


He 


Gin 
110 


Val 


Ala 


Gly Asp 


Lys 
115 


Pro 


Lys 


Arg 


He 


Ala 
120 


Val 


Gly Glu 


Ser 


Val 
125 


Asn 


Val 


Ser 


Thr Arg 


Leu 


Gin 


Ala 


Val 


Tyr 


Gin 


Asp 


HiB Val 


Val 


Leu 


Asp 


Arg 


Gly 


130 










135 








140 










Gly Val 


Glu 


Glu 


Ser 


Leu 


Arg 


Phe 


Pro 


Ala Val 


Arg 


Gin 


Pro 


Ser 


Leu 


145 








150 








155 










160 


Thr Pro 


Ala 


Tyr 


Ser 


Ala 


Leu 


Glu 


Pro 


Thr Ala 


Ser 


Gin 


Leu 


Glu 


Gin 






165 










170 








175 




Leu Gin 


Aep 


Glu 
180 


Asp 


Val 


Gin 


Ala 


Leu 
185 


Gin Glu 


Arg 


He 


Gin 
190 


Thr 


Leu 


Gin Gin 


Arg 
195 


Met 


Glu 


Gly 


Gly 


Asp 
200 


He 


Pro Gin 


Pro 


Glu 
205 


He 


Pro 


Glu 


Ala Glu 


Asp 


Ser 


Pro 























210 



(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1950 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

ATGATCGACT CCAGAATTCC GCCGCACAAA CGCCTGCCCC TCGCCCTGCT GCTGGCCGCG 60 

AGCTGCCTCG CCGCCCCGCT GCCGCTCGTC CATGCCGCCG AGCCGGTGGC GGTGAGCCAG 120 

GGCGCCGAGA CCTGGACCAT CAACATGAAG GACGCCGATA TCCGCGACTT CATCGACCAG 180 

GTGGCGCAGA TCTCTGGCGA GACCTTCGTC GTCGATCCGC GGGTCAAGGG CCAGGTCACG 240 

GTGATCTCCA AGACCCCGCT GGGCCTCGAG GAGGTCTACC AGCTGTTCCT TTCGGTGATG 300 

AGCACCCATG GCTTCAGCGT GCTGGCACAG GGCGACCAGG CGCGCATCGT GCCGGTCACC 360 
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GAGGCGCGTA GCGGCGCCAA CAGCAGCCGC hGCGCGCCGG ACGATGTGCA GACCGAGCTG 420 

ATCCAGGTGC AGCACACCTC GGTCAACGAA CTGATCCCGC TGATCCGCCC GCTGGTGCOG 480 

CAGAACGGCC ACCTGGOGGC GGTOGCCGCC TCCAACGCGC TGATCATCAG CGACCGCOGG 540 

GCNAATATCG AACGCATCCG CGAACTGATC GCCGAGCTCG ATGCCCAGGG CGGCGGCGAC 600 

TACAACGTGA TCAACCTGCA GCATGCCTGG GTACTGGACG CCGCCGAGGC ACTGAACAAC 660 

GOGGTGATGC GCAACGAGAA AAACAGCGCC GGCACCCGGG TGATTGCCGA CGCCOGCACC 720 

AACCGCCTGA TCCTCCTCGG CCOGCOGGCC GCCCGCCAGC GCCTGGCCAA CCTGGCCCGC 780 

TCGCTGGACA TCCCCAGCAC CCGTTCGGCC AATGCGCGGG TAATTCGCCT ACGCCACAGC 840 

GACGCCAAGA GCCTGGCOGA GACCCTGGGC GACATCTCCG AGGGGTTGAA GACCGOGGAG 900 

GGTGGTGGCG AAGCCGCCAG CAGCAAGCCG CAGAACATCC TGATCCGCCC CGACGAGAGC 960 

CTCAATGCCC TGGTCCTGCT GGCCGATCCG GACACCGTGG CGACCCTCGA GGAAATCGTG 1020 

CGCAACCTCG ACGTGCCGCG CGCCCAGGTG ATGGTCGAGG CGGCCATCGT GGAAATCTCC 1080 

GGGGACATCA GCGACGCCCT CGGCGTGCAG TGGGCGGTGG ATGCCCGCGG CGGCACCGGC 1140 

GGCCTCGGCG GGGTCAACTT CGGCAATACC GGGCTATCGG TGGGCACCGT GCTCAAGGCC 1200 

ATCCAGAACG AGGAAATCCC CGATGACCTG ACCCTGCCGG ACGGCGCCAT CATCGGCATC 1260 

GGCACCGAGA ACTTCGGCGC GCTGATCACT GCCCTCTCTG CCAACAGCAA GAGCAACCTG 1320 

CTGTCCACGC CCAGCCTGCT GACCCTGGAC AACCAGGAGG CGGAAATCCT GGTOGGGCAG 1380 

AACGTGCCTT TCCAGACCGG CTCCTACACC ACCGACGCCT CGGGGGCGAA CAACCCCTTC 1440 

ACCACCATTG AGCGCGAGGA CATCGGCGTG ACCCTCAAGG TCACCCCGCA CATCAACGAC 1500 

GGCGCCACCC TGCGCCTGGA AGTGGAGCAG GAGATCTCCT CCATCGCCCC CAGCGCCGGG 1560 

GTCAATGCCC AGGCGGTGGA CCTGGTGACC AACAAGCGCT CGATCAAGAG CGTGATCCTG 1620 

GCCGACGACG GCCAGGTCAT AGTGCTGGGA GGGCTGATCC AGGACGACGT CACCAGCACC 1680 

GACTCCAAGG TGCCGCTGCT GGGTGACATC CCGCTGATCG GCCGGCTGTT CCGCTCGACC 1740 

AAGGACACCC ACGTCAAGCG CAACCTGATG GTGTTCCTGC GCCCGACCAT CGTCCGCGAC 1800 

CGCGCCGGCA TGGCCGCGCT GTCGGGCAAG AAGTACAGCG ACATCAGCGT GCTGGGTGCC 1860 

GACGAGGATG GCCACAGCAG CCTGCCGGGC AGCGCCGAGC GCCTGTTCGA CAAACCCGGC 1920 

GCCGGTGCCG TGGACCTGCG CGACCAGTGA 1950 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 649 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 



Met 


He 


Asp 


Ser 


Arg 


He 


Pro 


Pro His 


Lys 


Arg 


Leu Pro Leu Ala 


Leu 


1 








5 








10 




15 




Leu 


Leu 


Ala 


Ala 


Ser 


Cys 


Leu 


Ala Ala 


Pro 


Leu 


Pro Lea Val His 


Ala 








20 








25 






30 




Ala 


Glu 


Pro 


Val 


Ala 


Val 


Ser 


Gin Gly 


Ala 


Glu 


Thr Trp Thr He 


Asn 






35 










40 






45 




Met 


Lye 


Asp 


Ala 


Asp 


He 


Arg 


Asp Phe 


He 


Asp 


Gin Val Ala Gin 


He 




50 










55 








60 




Ser 


Gly 


Glu 


Thr 


Phe 


Val 


Val 


Asp Pro 


Arg 


Val 


Lys Gly Gin Val 


Thr 


65 










70 








75 




80 


Val 


He 


Ser 


Lys 


Thr 


Pro 


Leu 


Gly Leu 


Glu 


Glu 


Val Tyr Gin Leu 


Phe 










85 








90 




95 




Leu 


Ser 


Val 


Met 


Ser 


Thr 


His 


Gly Phe 


Ser 


Val 


Leu Ala Gin Gly Asp 








100 








105 






110 




Gin 


Ala 


Arg 


He 


Val 


Pro 


Val 


Thr Glu 


Ala 


Arg 


Ser Gly Ala Asn 


Ser 






115 










120 






125 




Ser 


Arg 


Ser 


Ala 


Pro 


Asp 


Asp 


Val Gin 


Thr 


Glu 


Leu He Gin Val 


Gin 




130 










135 








140 




His 


Thr 


S r 


Val 


Asn 


Glu 


Leu 


He Pro 


Leu 


He 


Arg Pro Leu Val 


Pro 


145 










150 








155 




160 
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Gin Asn Gly His Leu Ala 


Ala 


Val 


'At— Oa.w» 

Aia Aia ser 


Asn 


Aia 


Leu 


T 1 A 

lie 


lie 


165 








1 /U 








•1 Jo 




Ser Asp Arg Arg Ala 


Asn 


He 


Glu 


Arg I le Arg 


G1U 


Leu 


He 


Aia 


G1U 


180 








loo 






190 






Leu Asp Ala Gin Gly Gly 


Gly Asp 


Tyr Asn Val 


lie 


Asn 


Leu 


Gin 


His 


195 






200 






205 








Ala Trp Val Leu Asp Ala 


Ala 


Glu 


Ala Leu Asn 


Asn 


Ala 


Val 


Met 


Arg 


210 




215 
















Asn Glu Lye Asn Ser 


Ala 


Gly Thr 


Arg Val lie 


Ala 


Asp Ala 


Arg 


Thr 


225 


230 


















Asn Arg Leu He Leu 


Leu 


Gly 


Pro 


Pro Ala Ala 


Arg 


Gin Arg 


Leu 


Ala 


245 








ZDU 








255 




Asn Leu Ala Arg Ser 


Leu 


Asp 


He 


Pro Ser Thr 


Arg 


Ser 


Ala 


Asn 


Ala 


260 














270 






Arg Val He Arg Leu 


Arg 


His 


Ser 


Asp Ala Lys 


Ser 


Leu 


Ala 


Glu 


Thr 


275 






280 






285 








Leu Gly Asp He Ser 


Glu 


Gly Leu 


Lys Thr Ala 


Glu 


Gly Gly 


Gly 


Glu 


290 




295 






300 










Ala Ala Ser Ser Lye 


Pro 


Gin 


Asn 


He Leu He 


Arg 


Ala Asp 


Glu 


Ser 


305 


310 






315 










320 


Leu Aan Ala Leu Val 


Leu 


Leu 


Ala 


Asp Pro Asp 


Thr 


Val 


Ala 


Thr 


Leu 


325 








330 








335 




Glu Glu He Val Arg 


Asn 


Leu 


Asp 


Val Pro Arg 


Ala 


Gin 


Val 


Met 


Val 


340 








345 






350 






Glu Ala Ala He Val 


Glu 


He 


Ser 


Gly Asp He 


Ser 


Asp Ala 


Leu 


Gly 


355 






360 






365 








Val Gin Trp Ala Val Asp 


Ala 


Arg 


Gly Gly Thr 


Gly 


Gly Leu 


Gly 


Gly 


370 




375 






380 










Val Asn Phe Gly Asn 


Thr 


Gly Leu 


Ser Val Gly 


Thr 


Val 


Leu 


Lys 


Ala 


385 


390 






395 










400 


He Gin Asn Glu Glu 


He 


Pro 


Asp 


Asp Leu Thr 


Leu 


Pro 


Asp 


Gly 


Ala 


405 








410 








415 




He He Gly He Gly Thr 


Glu 


Asn 


Phe Gly Ala 


Leu 


He 


Thr 


Ala 


Leu 


420 








425 






430 






Ser Ala Asn Ser Lys 


Ser 


Asn 


Leu 


Leu Ser Thr 


Pro 


Ser 


Leu 


Leu 


Thr 


435 






440 






445 








Leu Asp Asn Gin Glu 


Ala 


Glu 


He 


Leu Val Gly 


Gin 


Asn 


Val 


Pro 


Phe 


450 




455 






460 










Gin Thr Gly Ser Tyr 


Thr 


Thr 


Asp 


Ala Ser Gly 


Ala 


Asn 


Asn 


Pro 


Pne 


465 


470 






A "I C 












Thr Thr He Glu Arg Glu 


Asp 


He 


Gly val xnr 


Leu 


Lys 


val 


inr 


Pro 


485 




















His He Asn Asp Gly Ala 


Thr 


Leu 


Arg Leu Glu 


vai 


Glu 


Gin 


Glu 


Tin 

1 le 


500 








505 






510 






Ser Ser He Ala Pro 


Ser 


Ala Gly 


Val Asn Ala 


Gin 


Ala 


Val 


Asp 


Leu 


515 






520 






525 








Val Thr Asn Lys Arg 


Ser 


He Lys 


Ser Val lie 


Leu 


Ala Asp 


Asp 


Gly 


530 




535 






C Af\ 










Gin Val He Val Leu Gly 


Gly 


Leu 


lie Gin Asp 


Asp 


Val 


Thr 


Ser 


inr 


545 


550 






ODD 










ccn 
3DU 


Asp Ser Lys Val Pro 


Leu 


Leu 


Gly 


Asp lie pro 


Leu 


He Gly 


Arg 


Leu 


565 








■570 








575 




Phe Arg Ser Thr Lys Asp 


Thr 


His 


Val Lys Arg 


Asn 


Leu 


Met 


Val 


Phe 


580 








585 






590 






Leu Arg Pro Thr He 


Val 


Arg 


Asp 


Arg Ala Gly 


Met 


Ala 


Ala 


Leu 


Ser 


595 






600 






605 








Gly Lys Lys Tyr Ser 


Asp 


He 


Ser 


Val Leu Gly 


Ala 


Asp Glu 


Asp 


Gly 


' 0 




615 






620 










His ~er Ser Leu Pro Gly 


Ser 


Ala 


Glu Arg Leu 


Phe 


Asp Lys 


Pro 


Gly 
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625 630 635 640 

Ala Gly Ala Val Asp Leu Arg Asp Gin 
645 



(2) INFORMATION FOR SEQ ID NO: 16s 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2742 base pairs 

(B) TYPEs nucleic acid 

(C) STRANDEDNESS ; single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

ATGTCTGTTT GGGTCACGTG GCCGGGCTTG GTCAAGTTCG GCACCCTGGG CATCTATGCC 60 

GGCCTGATCA CGCTCGCGCT TGAGCGCGAC GTGCTGTTCA AGAACAACCT GTTCGACGTC 120 

GACAACCTGC CCGCGGCCAA CGCCAGCATC ACCTGTGATG CCCGCAGCCA GGTGGCGOGT 180 

ACCGAGGACG GCACCTGTAA CATCCTCGCC AACCCGGCCG AGGGCTCGGT GTACCGCCGC 240 

TTCGGGCGCA ACGTCGACCC CAGCGTGACC CATGGCGAGA CCGAGGCCGA CACCCTGCTC 300 

AGTCCCAATC CGCGGGAGGT GAGTAACGTG CTGATGGCGC GTGGCGAGTT CAAGCCGGCG 360 

CCCAGCCTCA ACTTCATCGC CGCCTCCTGG ATCCAGTTCA TGGTGCATGA CTGGGTCGAA 420 

CACGGCGCCA ACGCCGAAGC CAACCCGATC CAGGTGCCGC TGCCGGCTGG CGACGCGCTC 480 

GGCTCCGGCA GCCTGTCCGT GCGCCGCACC CAGCCCGACC CGACCCGTAC CCCGGCCGAG 540 

GCCGGCAAGC CGGCCACCTA CCGCAACCAC AACACCCACT GGTGGGATGG CTCGCAGTTG 600 

TATGGCAGCA GCAAGGACAT CAACGACAAG GTGCGCGCCT TCGAGGGTGG CAAGCTGAAG 660 

ATCAATCCCG ACGGTACCCT GCCGACOGAG TTCCTCAGCG GCAAGCCGAT CACCGGCTTC 720 

AACGAGAACT GGTGGGTTGG CCTGAGCATG CTGCACCAGC TGTTCACTAA GGAGCACAAC 780 

GCCATCGCGG CGATGCTCCA GCAGAAGTAC CCGGACAAGG ACGACCAGTG GCTGTACGAC 840 

CATGCGCGCC TGGTCAACTC CGCGCTGATG GCCAAGATCC ACACCGTGGA ATGGACCCCG 900 

GCGGTGATCG CCAACCCGGT CACCGAACGC GCCATGTATG CCAACTGGTG GGGCCTGCTG 960 

GGTTCCGGTC CGGAGCGTGA CAAGTACCAG GAAGAGGCGC GCATGCTGCA GGAGGACCTG 1020 

GCCAGCTCCA ACTCCTTCGT CCTGCGCATT CTCGGCATCG ACGGCAGCCA GGCCGGCAGT 1080 

TCGGCCATCG ACCATGCCCT GGCCGGCATC GTOGGCTCGA CCAACCCGAA CAACTACGGC 1140 

GTGCCCTACA CCCTGACCGA GGAGTTCGTC GOGGTCTACC GCATGCACCC GCTGATGCGC 1200 

GACAAGGTCG ATGTCTACGA CATCGGCTCG AACATCATCG CGCGCAGCGT GCCGCTGCAG 1260 

GAGACCCGCG ATGCCGACGC CGAGGAGCTG CTGGCGGACG AGAATCCCGA GCGCCTGTGG 1320 

TACTCCTTCG GCATCACCAA CCCGGGCTCG CTGACCCTCA ACAACTACCC GAACTTCCTG 1380 

CGCAACCTGT CCATGCCGCT GGTCGGCAAC ATCGACCTGG CGACCATCGA CGTGCTGTGT 1440 

GACCGCGAGC GCGGGGTGCC GCGCTACAAC GAGTTCCGCC GCGAGATCGG CCTCAACCCG 1500 

ATCACCAAGT TGGAGGACCT GACCACCGAC CCGGCCACCC TGGCCAACCT CAAGCGCATC 1560 

TACGGCAACG ACATCGAGAA GATTGACACC CTGGTCGGCA TGCTGGCCGA GACCGTGCGT 1620 

CCGGACGGCT TCGCCTTCGG CGAGACGGCC TTCCAGATCT TCATCATGAA CGCCTCGCGG 1680 

CGCCTGATGA CCGACCGCTT CTATACCAAG GACTACCGCC CGGAGATCTA CACCGCCGAG 1740 

GGCCTGGCCT GGGTCGAGAA CACCACCATG GTOGACGTGC TCAAACGCCA CAATCCGCAG 1800 

CTGGTCAACA GCCTGGTTGG CGTGGAAAAC GCCTTCAAAC CCTGGGGCCT GAACATCCCG 1860 

GCCGACTACG AGAGCTGGCC GGGCAAGGCC AAGCAGGACA ACCTGTGGGT CAACGGCGCC 1920 

NTGCGCACCC AGTACGCCGC AGGCCAGCTG CCGGCCATTC CGCCGGTGGA CGTCGGCGGC 1980 

CTGATCAGTT CGGTGCTGTG GAAGAAGGTG CAGACCAANT CCGACGTGGC GCCGGCCGGC 2040 

TACGAGAAGG CCATGCACCC GCATGGCGTG ATGGCCAAGG TCAAGTTCAC CGCCGTGCCG 2100 

GGGCACCCCT ACACCGGCCT GTTCCAGGGT GCCGACAGCG GCCTGCTGCG CCTGTCGGTG 2160 

GCCGGCGACC CGGCAACCAA CGGCTTCCAG CCGGGTCTGG CGTGGAAGGC CTTCGTCGAC 2220 

GGCAAGCCGT CGCAGAACGT CTCCGCGCTC TACACCCTGA GCGGGCAGGG CAGCAACCAC 2280 

AACTTCTTCG CCAACGAGCT GTCGCAGTTC GTCCTGCCGG AGACCAACGA TACCCTGGGC 2340 

ACCACGCTGC TGTTCTCGCT GGTCAGCCTC AAGCCGACCT TGCTGCGCGT GGACGACATG 2400 

GCCGAAGTGA CCCAGACCGG CCAGGCCGTG ACTTCGGTCA AGGCGCOGAC GCAGATCTAC 2460 

TTCGTGCCCA AGCCGGAGCT GCGCAGCCTG TTCTCCAGTG CGGCGCATGA CTTCCGCAGC 2520 

GACCTGACGA GCCTCACCGC CGGCACCAAG CTGTACGACG TCTACGCTAC CTCGATGGAG 2580 

ATCAAGACCT CGATCCTGCC GTCGACCAAT CGTAGCTACG CCCAGCAACG GCGCAACAGC 2640 
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GCGGTGAAGA TCGGCGAGAT GGAGCTGACC TCGCCGTTCA TCGCCTC6GC CTTCGGCGAC 2700 
AACGGGGTGT TCTTCAAGCA CCAGCGTCAC GAAGACAAAT AA 2742 



<2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 913 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 



Met 


Ser 


Val 


Trp 


Val 


Thr 


Trp 


Pro 


Gly 


Leu 


Val 


Lys Phe Gly Thr Leu 


1 








5 










10 




15 


Gly 


He 


Tyr 


Ala 


Gly 


Leu 


He 


Thr 


Leu 


Ala 


Leu 


Glu Arg Asp Val Leu 








20 










25 






30 


Phe 


Lys 


Asn 


Asn 


Leu 


Phe 


Asp 


Val 


Asp 


Asn 


Leu 


Pro Ala Ala Asn Ala 






35 










40 








45 


Ser 


He 


Thr 


Cys 


Asp 


Ala 


Arg 


Ser 


Gin 


Val 


Ala 


Arg Thr Glu Asp Gly 




50 










55 










60 


Thr 


Cye 


Asn 


He 


Leu 


Ala 


Asn 


Pro 


Ala 


Glu 


Gly 


Ser Val Tyr Arg Arg 


65 










70 










75 


80 


Phe 


Gly 


Arg 


Asn 


Val 


Asp 


Pro 


Ser 


Val 


Thr 


His 


Gly Glu Thr Glu Ala 










85 










90 




95 


Asp 


Thr 


Leu 


Leu 


Ser 


Pro 


Asn 


Pro 


Arg 


Glu 


Val 


Ser Asn Val Leu Met 








100 










105 






110 


Ala 


Arg 


Gly 


Glu 


Phe 


Lys 


Pro 


Ala 


Pro 


Ser 


Leu 


Asn Phe He Ala Ala 






115 










120 








125 


Ser 


Trp 


He 


Gin 


Phe 


Met 


Val 


His 


Asp 


Trp 


Val 


Glu His Gly Pro Asn 




130 










135 










140 


Ala 


Glu 


Ala 


Asn 


Pro 


He 


Gin 


Val 


Pro 


Leu 


Pro 


Ala Gly Asp Ala Leu 


145 










150 










155 


160 


Gly 


Ser 


Gly 


Ser 


Leu 


Ser 


Val 


Arg 


Arg 


Thr 


Gin 


Pro Asp Pro Thr Arg 










165 










170 




175 


Thr 


Pro 


Ala 


Glu 


Ala 


Gly 


Lys 


Pro 


Ala 


Thr 


Tyr 


Arg Asn His Asn Thr 








180 










185 






190 


His 


Trp 


Trp Asp 


Gly 


Ser 


Gin 


Leu 


Tyr 


Gly 


Ser 


Ser Lys Asp He Asn 






195 










200 








205 


Asp 


Lys 


Val 


Arg 


Ala 


Phe 


Glu 


Gly 


Gly 


Lys 


Leu 


Lys He Asn Pro Asp 




210 










215 










220 


Gly 


Thr 


Leu 


Pro 


Thr 


Glu 


Phe 


Leu 


Ser 


Gly 


Lys 


Pro He Thr Gly Phe 


225 










230 










235 


240 


Asn 


Glu 


Asn Trp 


Trp 


Val 


Gly 


Leu 


Ser 


Met 


Leu 


His Gin Leu Phe Thr 










245 










250 




255 


Lys 


Glu 


His 


Asn 


Ala 


He 


Ala 


Ala 


Met 


Leu 


Gin 


Gin Lys Tyr Pro Asp 








260 










265 






270 


Lys 


Asp 


Asp Gin 


Trp 


Leu 


Tyr 


Asp 


His 


Ala 


Arg 


Leu Val Asn Ser Ala 






275 










280 








285 


Leu 


Met 


Ala Lys 


He 


His 


Thr 


Val 


Glu 


Trp 


Thr 


Pro Ala Val He Ala 




290 










295 










300 


Asn 


Pro 


Val 


Thr 


Glu 


Arg 


Ala 


Met 


Tyr 


Ala 


Asn 


Trp Trp Gly Leu Leu 


305 










310 










315 


320 


Gly 


Ser 


Gly Pro 


Glu 


Arg 


Asp 


Lys 


Tyr 


Gin 


Glu 


Glu Ala Arg Met Leu 










325 










330 




335 


Gin 


Glu 


Asp Leu 


Ala 


Ser 


Ser 


Asn 


Ser 


Phe 


Val 


Leu Arg He Leu Gly 








340 










345 






350 


He 


Asp 


Gly Ser 


Gin 


Ala 


Gly 


Ser 


Ser 


Ala 


He 


Asp His Ala Leu Ala 
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355 360 365 

Gly lie Val Gly Ser Thr Asn Pro Asn Asn Tyr Gly Val Pro Tyr Thr 

370 375 380 

Leu Thr Glu Glu Phe Val Ala Val Tyr Arg Met Hie Pro Leu Met Arg 
385 390 395 400 

Asp Lys Val Asp Val Tyr Asp lie Gly Ser Asn lie lie Ala Arg Ser 

405 410 415 

Val Pro Leu Gin Glu Thr Arg Asp Ala Asp Ala Glu Glu Leu Leu Ala 

420 425 430 

Asp Glu Asn Pro Glu Arg Leu Trp Tyr Ser Phe Gly lie Thr Asn Pro 

435 440 445 

Gly Ser Leu Thr Leu Asn Asn Tyr Pro Asn Phe Leu Arg Asn Leu Ser 

450 455 460 

Met Pro Leu Val Gly Asn He Asp Leu Ala Thr He Asp Val Leu Cys 
465 470 475 480 

Asp Arg Glu Arg Gly Val Pro Arg Tyr Asn Glu Phe Arg Arg Glu He 

485 490 495 

Gly Leu Asn Pro He Thr Lys Leu Glu Asp Leu Thr Thr Asp Pro Ala 

500 505 510 

Thr Leu Ala Asn Leu LyB Arg He Tyr Gly Asn Asp He Glu Lys He 

515 520 525 

Asp Thr Leu Val Gly Met Leu Ala Glu Thr Val Arg Pro Asp Gly Phe 

530 535 540 

Ala Phe Gly Glu Thr Ala Phe Gin He Phe He Met Asn Ala Ser Arg 
545 550 555 560 

Arg Leu Met Thr Asp Arg Phe Tyr Thr Lys Asp Tyr Arg Pro Glu He 

565 570 575 

Tyr Thr Ala Glu Gly Leu Ala Trp Val Glu Asn Thr Thr Met Val Asp 

580 585 590 

Val Leu Lys Arg His Asn Pro Gin Leu Val Asn Ser Leu Val Gly Val 

595 600 605 

Glu Asn Ala Phe Lys Pro Trp Gly Leu Asn He Pro Ala Asp Tyr Glu 

610 615 620 

Ser Trp Pro Gly Lys Ala Lys Gin Asp Asn Leu Trp Val Asn Gly Ala 
625 630 635 640 

Xaa Arg Thr Gin Tyr Ala Ala Gly Gin Leu Pro Ala He Pro Pro Val 

645 650 655 

Asp Val Gly Gly Leu He Ser Ser Val Leu Trp Lys Lys Val Gin Thr 

660 665 670 

Xaa Ser Asp Val Ala Pro Ala. Gly Tyr Glu Lys Ala Met His Pro H1b 

675 680 685 

Gly Val Met Ala Lys Val Lys Phe Thr Ala Val Pro Gly His Pro Tyr 

690 695 700 

Thr Gly Leu Phe Gin Gly Ala Asp Ser Gly Leu Leu Arg Leu Ser Val 
705 710 715 720 

Ala Gly Asp Pro Ala Thr Asn Gly Phe Gin Pro Gly Leu Ala Trp Lys 

725 730 735 

Ala Phe Val Asp Gly Lys Pro Ser Gin Asn Val Ser Ala Leu Tyr Thr 

740 745 750 

Leu Ser Gly Gin Gly Ser Asn His Asn Phe Phe Ala Asn Glu Leu Ser 

755 760 765 

Gin Phe Val Leu Pro Glu Thr Asn Asp Thr Leu Gly Thr Thr Leu Leu 

770 775 780 

Phe Ser Leu Val Ser Leu Lys Pro Thr Leu Leu Arg Val Asp Asp Met 
785 790 795 800 

Ala Glu Val Thr Gin Thr Gly Gin Ala Val Thr Ser Val Lys Ala Pro 

805 810 815 

Thr Gin He Tyr Phe Val Pro Lys Pro Glu Leu Arg Ser Leu Phe Ser 
820 825 830 
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S r Ala Ala His Asp Phe Arg Ser Asp Leu Thr Ser Leu Thr Ala Gly 

835 840 845 

Thr Lys Leu Tyr Asp Val Tyr Ala Thr Ser Met Glu He Lys Thr Ser 

850 855 860 

He Leu Pro Ser Thr Asn Arg Ser Tyr Ala Gin Gin Arg Arg Ann Ser 
865 870 875 880 

Ala Val Lye lie Gly Glu Met Glu Leu Thr Ser Pro Phe He Ala Ser 

885 890 895 

Ala Phe Gly Aep Asn Gly Val Phe Phe Lys His Gin Arg His Glu Asp 
900 90S 910 

Lys 



(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 525 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDBDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

ATGCAGCGGG GGCGCGGTTT CACTCTGATC GAGCTGCTGG TGGTGCTGGT GCTGCTGGGC 60 

GTGCTCACCG GCCTCGCCGT GCTCGGCAGC GGGATCGCCA GCAGCCCCGC GCGCAAGCTG 120 

GCGGACGAGG CCGAGCGCCT GCAGTCGCTG CTGCGGGTGC TGCTCGACGA GGCGGTGCTG 180 

GACAACCGCG AGTATGGCGT ACGCTTCGAC GCCCGGAGCT ACCGGGTGCT GCGCTTCGAG 240 

CCGCGCACGG CGCGCTGGGA GCCGCTCGAC GAGCGCGTGC ACGAGCTGCC GGAGTGGCTC 300 

GAGCTGGAGA TCGAGGTCGA CGAGCAGAGT GTCGGGCTGC CCGCCGCCCG TGGCGAGCAG 360 

GACAAAGCCG CGGCCAAGGC GCCACAGCTG CTGCTGCTCT CCAGTGGCGA GCTGACCCCC 420 

TTCGCCCTGC GCCTGTCCGC CGGCCGCGAG CGCGGCGCGC CGGTGCTGAC GCTGGCCAGC 480 

GACGGCTTCG CCGAGCCCGA GCTGCAGCAG GAAAAGTCCC GATGA 525 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 174 amino acids 

(B) TYPE: amino acid 

(C) STRANDBDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 



Met 


Gin 


Arg 


Gly 


Arg 


Gly 


Phe 


Thr Leu 


He 


Glu 


Leu 


Leu 


val 


Val 


Leu 


1 








5 * 








10 










15 




Val 


Leu 


Leu 


Gly 


Val 


Leu 


Thr 


Gly Leu 


Ala 


Val 


Leu 


Gly 


Ser 


Gly 


He 








20 








25 










30 






Ala 


Ser 


Ser 


Pro 


Ala 


Arg 


Lys 


Leu Ala 


Asp 


Glu 


Ala 


Glu 


Arg 


Leu 


Gin 






35 










40 








45 








Ser 


Leu 


Leu 


Arg 


Val 


Leu 


Leu 


Asp Glu 


Ala 


Val 


Leu 


Asp 


Asn 


Arg 


Glu 




50 










55 








60 










Tyr 


Gly 


Val 


Arg 


Phe 


Asp 


Ala 


Arg Ser 


Tyr 


Arg 


Val 


Leu 


Arg 


Phe 


Glu 


65 










70 








75 










80 


Pro 


Arg 


Thr 


Ala 


Arg 


Trp 


Glu 


Pro Leu 


Asp 


Glu 


Arg 


Val 


His 


Glu 


Leu 








85 








90 










95 




Pro 


Glu 


Trp 


Leu 


Glu 


Leu 


Glu 


He Glu 


Val 


Asp 


Glu 


Gin 


Ser 


Val 


Gly 






100 








105 










110 






Leu 


Pro 


Ala 


Ala 


Arg 


Gly 


Glu 


Gin Asp 


Lys 


Ala 


Ala 


Ala 


Lys 


Ala 


Pro 
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115 120 
Gin Leu Leu Leu Leu Ser Ser Gly 

130 135 
Leu Ser Ala Gly Arg Glu Arg Gly 
145 150 
Aap Gly Phe Ala Glu Pro Glu Leu 
165 



125 

Glu Leu Thr Pro Phe Ala Leu Arg 
140 

Ala Pro Val Leu Thr Leu Ala Ser 
155 160 
Gin Gin Glu Lys Ser Arg 
170 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 390 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 



ATGAAGCGCG GCCGCGGCTT CACCCTGCTC GAGGTGCTGG TGGCCCTGGC GATCTTCGCC 60 

GTGGTCGCCG CCAGCGTGCT CAGCGCCAGC GCTCGCTCGC TGAAGACCGC CGCGCGCCTG 120 

GAGGACAAGA CCTTCGCCAC CTGGCTGGCG GACAACCGCC TGCAGGAGCT GCAGCTGGCC 180 

GAOGTGCCGC CGGGCGAGGG CCGCGAGCAG GGCGAGGAGA GCTACGCCGG GCGGCGCTGG 240 

CTGTGGCAGA GCGAGGTGCA GGCCACCAGC GAGCCGGAGA TGCTGCGTGT CACCGTACGG 300 

GTGGCGCTGC GGCCGGAGCG CGGGCTGCAG GGCAAGATCG AAGACCATGC CCTGGTGACC 360 

CTGAGTGGCT TCGTCGGGGT CGAGCCATGA 390 



(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 129 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 



Met 


Lys 


Arg 


Gly 


Arg 


Gly 


Phe 


Thr Leu 


Leu Glu Val Leu Val Ala Leu 


1 








5 








10 15 


Ala 


He 


Phe 


Ala 


Val 


Val 


Ala 


Ala Ser 


Val Leu Ser Ala Ser Ala Arg 








20 








25 


30 


Ser 


Leu 


Lys 


Thr 


Ala 


Ala 


Arg 


Leu Glu 


Asp Lys Thr Phe Ala Thr Trp 






35 










40 


45 


Leu 


Ala 


Asp 


Asn 


Arg 


Leu 


Gin 


Glu Leu Gin Leu Ala Asp Val Pro Pro 




50 










55 




60 


Gly 


Glu 


Gly 


Arg 


Glu 


Gin 


Gly 


Glu Glu 


Ser Tyr Ala Gly Arg Arg Trp 


65 










70 






75 80 


Leu 


Trp 


Gin 


Ser 


Glu 


Val 


Gin 


Ala Thr 


Ser Glu Pro Glu Met Leu Arg 










85 








90 95 


Val 


Thr 


Val 


Arg 


Val 


Ala 


Leu 


Arg Pro Glu Arg Gly Leu Gin Gly Lys 








100 








105 


110 


He 


Glu 


Asp 


His 


Ala 


Leu 


Val 


Thr Leu 


Ser Gly Phe Val Gly Val Glu 






115 










120 


125 



Pro 



(2) INFORMATION FOR SEQ ID NO: 22: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 684 baBe pairs 

(B) TYPE: nucleic acid 

(C) STRANDBDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

ATGAGGCAGC GCGGCTTCAC CCTGCTGGAA GTGCTGATCG CCATCGCCAT CTTCGCCCTG 60 

CTGGCCATGG CCACCTACCG CATGCTOGAC AGCGTGCTGC AGACCGATCG TGGCCAGCGC 120 

CAGCAGGAGC AGCGTCTGCG CGAGCTGACG CGGGCCATGG CAGCTTTOGA ACGCGACCTG 180 

CTGCAGGTGC GCCTGCGTCC GGTGCGCGAC CCGCTGGGCG ACCTGCTGCC AGCCCTGCGC 240 

GGCA6CAGTG GCCGCGACAC CCAGCTGGAG TTCACCCGCA GCGGCTGGCG CAACCCGCTC 300 

GGCCAGCCGC GCGCCACCCT ACAGCGGGTG CGCTGGCAGC TCGAAGGCGA GCGCTGGCAG 360 

CGCGCTTACT GGACGGTGCT GGACCAGGCC CAGGACAGCC AGCCGCGGGT GCAGCAGGCG 420 

CTGGATGGCG TGCGCCGCTT CGACTTGCGC TTTCTOGACC AGGAGGGGCG CTGGCTGCAG 480 

GACTGGCCGC CGGCCAACAG TGCTGCCGAC GAGGCCCTGA CCCAGCTGCC GCGTGCCGTC 540 

GAGCTGGTCC TCGAGCACCG CCATTACGGT GAACTGCGCC GTCTCTGGCG CTTGCCCGAG 600 

ATGCCGCAGC AGGAACAGAT CACGCCGCCC GGGGGCGAGC AGGGCGGTGA GCTGCTGCCG 660 

GAAGAGCCGG AGCCCGAGGC ATGA 684 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 227 amino acids 

(B) TYPE: amino acid 

(C) STRANDED NESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 



Met 


Arg 


Gin 


Arg 


Gly 


Phe 


Thr 


Leu Leu 


Glu 


Val 


Leu He Ala He 


Ala 


l 






5 








10 




15 




He 


Phe 


Ala 


Leu 


Leu 


Ala 


Met 


Ala Thr 


Tyr 


Arg 


Met Leu Asp Ser 


Val 








20 








25 






30 




Leu 


Gin 


Thr 


Asp 


Arg 


Gly 


Gin 


Arg Gin 


Gin 


Glu 


Gin Arg Leu Arg 


Glu 






35 










40 






45 




Leu 


Thr 


Arg 


Ala 


Met 


Ala 


Ala 


Phe Glu 


Arg 


Asp 


Leu Leu Gin Val 


Arg 




50 










55 








60 




Leu 


Arg 


Pro 


Val 


Arg 


Asp 


Pro 


Leu Gly 


Asp 


Leu 


Leu Pro Ala Leu 


Arg 


65 










70 








75 




80 


Gly 


Ser 


Ser 


Gly 


Arg 


Asp 


Thr 


Gin Leu 


Glu 


Phe 


Thr Arg Ser Gly Trp 








85 








90 




95 




Arg 


Aan 


Pro 


Leu 


Gly 


Gin 


Pro 


Arg Ala 


Thr 


Leu 


Gin Arg Val Arg 


Trp 






100 








105 






110 




Gin 


Leu 


Glu 


Gly 


Glu 


Arg 


Trp 


Gin Arg 


Ala 


Tyr 


Trp Thr Val Leu Asp 






115 










120 






125 




Gin 


Ala 


Gin 


Asp 


Ser 


Gin 


Pro 


Arg Val 


Gin 


Gin 


Ala Leu Asp Gly 


Val 




130 










135 








140 




Arg 


Arg 


Phe 


Asp 


Leu 


Arg 


Phe 


Leu Asp 


Gin 


Glu 


Gly Arg Trp Leu 


Gin 


145 










150 








155 




160 


Asp 


Trp 


Pro 


Pro 


Ala 


Asn 


Ser 


Ala Ala 


Asp 


Glu 


Ala Leu Thr Gin 


Leu 






165 








170 




175 




Pro 


Arg 


Ala 


Val 


Glu 


Leu 


Val 


Val Glu 


His 


Arg 


His Tyr Gly Glu 


Leu 






180 








185 






190 




Arg 


Arg 


Leu 


Trp 


Arg 


Leu 


Pro 


Glu Met 


Pro 


Gin 


Gin Glu Gin He 


Thr 






195 










200 






205 




Pro 


Pro 


Gly 


Gly 


Glu 


Gin 


Gly 


Gly Glu 


Leu 


Leu 


Pro Glu Glu Pro 


Glu 
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210 
Pro Glu Ala 
225 



215 



220 



(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 954 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDBDNESSs 0 ingle 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

ATGAGCCGGC AGCGCGGCGT GGCACTGATC ACCGTGCTGC TGGTGGTGGC GCTGGTGACC 60 

GTGGTCTGCG CGGCCCTGCT GCTGCGCCAG CAGCTGGCCA TCCGCAGCAC CGGCAACCAG 120 

CTGCTGGTGC GCCAGGCCCA GTACTACGCC GAAGGCGGCG AGCTGCTGGC CAAGGCCCTG 180 

CTGCGTCGCG ACCTGGCCGC CGACCAGGTC GATCATCCCG GCGAGCCCTG GGCCAACCCC 240 

GGCCTGCGCT TCCCCCTGGA TGAGGGCGGC GAGCTGCGCC TGCGCATCGA GGACCTGGCC 300 

GGACGTTTCA ACCTCAACAG CCTGGCCGCC GGTGGTGAGG CCGGTGAGTT GGCGCTGCTG 360 

CGCCTGCGGC GCCTGCTGCA GCTGCTGCAG CTGACCCCGG CCTATGCCGA GCGCCTGCAG 420 

GACTGGCTCG ACGGCGATCA GGAGGCCAGC GGCATGGCCG GCGCCGAGGA TGACCAGTAC 460 

CTGCTGCAGA AACCGCCCTA CCGTACCGGC CCCGGGCGCA TTGCCGAGGT GTCGGAGCTG 540 

CGCCTGCTGC TGGGCATGAG CGAGGCCGAC TACCGCCGCC TGGCCCCCTT CGTCAGOGCC 600 

CTGCCGAGCC AGGTCGAGCT GAACATCAAC ACCGCCAGCG CCCTGGTGCT GGCTTGCCTG 660 

GGCGAGGGCA TNCCCGAGGC GGTGCTCGAG GCCGCCATCG ANGGTCGCGG CCGCAGCGGC 720 

TATCGCGAGC CCGCTGCCTT CGTCCAGCAN CTTGCCAGCT ACGGCGTCAG CCCGCAGGGG 780 

CTGGGCATCG CCAGCCAGTA TTTCCGTGTC ACCACCGAGG TGCTGCTGGG TGAGCGGCGC 840 

CAGGTGCTGG CCAGTTATCT GCAACGTGGT AATGATGGGC GCGTCCGCCT GATGGCGCGC 900 

GATCTGGGGC AGGAGGGCCT GGCGCCCCCA CCCGTCGAGG AGTCCGAGAA ATGA 954 



(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 317 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:25: 



Met 


Ser 


Arg 


Gin 


Arg 


Gly 


Val 


Ala 


Leu 


He 


Thr 


Val 


Leu 


Leu 


Val 


Val 


1 








5 










10 










15 




Ala 


Leu 


val 


Thr 
20 


Val 


Val 


Cyo 


Ala 


Ala 
25 


Leu 


Leu 


Leu 


Arg 


Gin 
30 


Gin 


Leu 


Ala 


He 


Arg 

35 


Ser 


Thr 


Gly 


Asn 


Gin 
40 


Leu 


Leu 


Val 


Arg 


Gin 
45 


Ala 


Gin 


Tyr 


Tyr 


Ala 
50 


Glu 


Gly 


Gly 


Glu 


Leu 
55 


Leu 


Ala 


Lys 


Ala 


Leu 
60 


Leu 


Arg 


Arg 


Asp 


Leu 


Ala 


Ala 


Asp 


Gin 


Val 


Asp 


His 


Pro 


Gly 


Glu 


Pro 


Trp 


Ala 


Asn 


Pro 


65 










70 










75 










80 


Gly 


Leu 


Arg 


Phe 


Pro 


Leu 


Asp 


Glu Gly 


Gly 


Glu 


Leu 


Arg 


Leu 


Arg 


He 










85 










90 










95 




Glu 


Asp 


Leu 


Ala 
100 


Gly 


Arg 


Phe 


Asn 


Leu 
105 


Asn 


Ser 


Leu 


Ala 


Ala 
110 


Gly 


Gly 


Glu 


Ala 


Gly 
115 


Glu 


Leu 


Ala 


Leu 


Leu 
120 


Arg 


Leu 


Arg 


Arg 


Leu 
125 


Leu 


Gin 


Leu 
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Leu Gin Leu Thr Pro Ala Tyr Ala Glu Arg Leu Gin Asp Trp Leu Asp 

130 135 140 

Gly Asp Gin Glu Ala S r Gly Met Ala Gly Ala Glu Asp Asp Gin Tyr 
145 150 155 160 

Leu Leu Gin Lye Pro Pro Tyr Arg Thr Gly Pro Gly Arg lie Ala Glu 

165 170 175 

Val Ser Glu Leu Arg Leu Leu Leu Gly Met Ser Glu Ala Asp Tyr Arg 

180 185 190 

Arg Leu Ala Pro Phe Val Ser Ala Leu Pro Ser Gin Val Glu Leu Asn 

195 200 205 

lie Asn Thr Ala Ser Ala Leu Val Leu Ala Cyo Leu Gly Glu Gly Xaa 

210 215 220 

Pro Glu Ala Val Leu Glu Ala Ala lie Xaa Gly Arg Gly Arg Ser Gly 
225 230 235 240 

Tyr Arg Glu Pro Ala Ala Phe Val Gin Xaa Leu Ala Ser Tyr Gly Val 

245 250 255 

Ser Pro Gin Gly Leu Gly lie Ala Ser Gin Tyr Phe Arg Val Thr Thr 

260 265 270 

Glu Val Leu Leu Gly Glu Arg Arg Gin Val Leu Ala Ser Tyr Leu Gin 

275 280 285 

Arg Gly Asn Asp Gly Arg Val Arg Leu Met Ala Arg Asp Leu Gly Gin 

290 295 300 

Glu Gly Leu Ala Pro Pro Pro Val Glu Glu Ser Glu Lys 



(2) INFORMATION FOR SBQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1146 baBe pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: 

ATGAGTCTGC TCACCCTGTT TCTGCCGCCC CAGGCCTGCA CCGAGGCGAG CGCCGACATG 60 

CCGGTGTGGT GCGTOGAGAG CGACAGCTGC CGTCAGCTGC CCTTCGCCGA GGCCTTGCCG 120 

GCCGACGCGC GGGTCTGGCG CTTGGTGCTG CCGGTGGAGG CGGTGACCAC CTGTGTCGTG 180 

CAGTTGCCGA CCACCAAGGC ACGCTGGCTG GCCAAGGCCC TGCCGTTCGC CGTCGAGGAG 240 

CTGCTGGCCG AGGAGGTGGA GCAGTTTCAC CTGTGCGTCG GTAGCGCGCT GGTCGATGGT 300 

CGTCATCGTG TTCATGCCCT GCGCCGCGAG TGGCTGGCCG GCTGGCTGGC GCTGTGCGGC 360 

GAGCGGCCGC CGCAGTGGAT CGAGGTGGAC GCCGACCTGT TGCCGGAGGA GGGTAGCCAG 420 

CTGCTCTGCC TGGGCGAGCG CTGGTTGCTC GGCGGGTCGG GCGAGGCGCG CCTGGCCCTG 480 

CGTGGCGAGG ACTGGCCGCA GCTGGCGGCG CTCTGTCCGC CGCCCCGGCA AGCCTATGTG 540 

CCGCCCGGGC AGGCGGCGCC GCCGGGCGTC GAGGCCTGCC AGACGCTGGA GCAGCCGTGG 600 

CTCTGGCTGG CCGCGCAGAA GTCCGGCTGC AACCTGGCCC AGGGGCCTTT CGCCCGTCGC 660 

GAGCCTTCCG GCCAGTGGCA GCGCTGGCGG CCGCTGGCGG GGCTGCTCGG TCTCTGGCTG 720 

GTGCTGCAKT GGGGCTTCAA CCTTGCCCAN GGCTGGCAGC TGCAGCGCGA GGGTGAACGC 780 

TATGCCGTGG CCAACGAGGC GCTGTATCGC GAGCTGTTCC CCGAGGATCG CAAGGTGATC 840 

AACCTGCGT6 CGCAGTTCGA CCAGCACCTG GCCGAGGCGG CTGGGAGCGG CCAGAGCCAG 900 

TTGCTGGCCC TGCTCGATCA GGCCGCCGCG GCCATCGGCG AAGGGGGGGC GCAGGTGCAG 960 

GTGGATCAGC TCGACTTCAA CGCCCAGCGT GGCGACCTGG CCTTCAACCT GCGTGCCAGC 1020 

GACTTCGCCG CGCTGGAAAG CCTGCGGGCG CGCCTGCAGG AGGCCGGCCT GGCGGTGGAC 1080 

ATGGGCTCGG CGAGCCGCGA GGACAACGGC GTCAGTGCGC GCCTGGTGAT CGGGGGTAAC 1140 

GGATGA 1146 



(2) INFORMATION FOR SEQ ID NO: 27: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 381 amino acids 

(B) TYPE: amino acid 

(C) STRANDBDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 

Met Ser Leu Leu Thr Leu Phe Leu Pro Pro Gin Ala Cys Thr Glu Ala 

15 10 15 

Ser Ala Aep Met Pro Val Trp Cys Val Glu Ser Asp Ser Cys Arg Gin 

20 25 30 

Leu Pro Phe Ala Glu Ala Leu Pro Ala Asp Ala Arg Val Trp Arg Leu 

35 40 45 

Val Leu Pro Val Glu Ala Val Thr Thr Cys Val Val Gin Leu Pro Thr 

50 55 60 

Thr Lys Ala Arg Trp Leu Ala Lys Ala Leu Pro Phe Ala Val Glu Glu 
65 70 75 80 

Leu Leu Ala Glu Glu Val Glu Gin Phe His Leu Cys Val Gly Ser Ala 

85 90 95 

Leu Val Asp Gly Arg His Arg Val His Ala Leu Arg Arg Glu Trp Leu 

100 105 110 

Ala Gly Trp Leu Ala Leu Cys Gly Glu Arg Pro Pro Gin Trp lie Glu 

115 120 125 

Val Asp Ala Asp Leu Leu Pro Glu Glu Gly Ser Gin Leu Leu Cys Leu 

130 135 140 

Gly Glu Arg Trp Leu Leu Gly Gly Ser Gly Glu Ala Arg Leu Ala Leu 
145 150 155 160 

Arg Gly Glu Asp Trp Pro Gin Leu Ala Ala Leu Cys Pro Pro Pro Arg 

165 170 175 

Gin Ala Tyr Val Pro Pro Gly Gin Ala Ala Pro Pro Gly Val Glu Ala 

180 185 190 

Cys Gin Thr Leu Glu Gin Pro Trp Leu Trp Leu Ala Ala Gin Lys Ser 

195 200 205 

Gly Cys Asn Leu Ala Gin Gly Pro Phe Ala Arg Arg Glu Pro Ser Gly 

210 215 220 

Gin Trp Gin Arg Trp Arg Pro Leu Ala Gly Leu Leu Gly Leu Trp Leu 
225 230 235 240 

Val Leu Xaa Trp Gly Phe Asn Leu Ala Xaa Gly Trp Gin Leu Gin Arg 

245 250 255 

Glu Gly Glu Arg Tyr Ala Val Ala Asn Glu Ala Leu Tyr Arg Glu Leu 

260 265 270 

Phe Pro Glu Asp Arg Lys Val lie Asn Leu Arg Ala Gin Phe Asp Gin 

275 280 285 

His Leu Ala Glu Ala Ala Gly Ser Gly Gin Ser Gin Leu Leu Ala Leu 

290 295 300 

Leu Asp Gin Ala Ala Ala Ala lie Gly Glu Gly Gly Ala Gin Val Gin 
305 310 315 320 

Val Asp Gin Leu Asp Phe Asn Ala Gin Arg Gly Asp Leu Ala Phe Asn 

325 330 335 

Leu Arg Ala Ser Asp Phe Ala Ala Leu Glu Ser Leu Arg Ala Arg Leu 

340 345 350 

Gin Glu Ala Gly Leu Ala Val Asp Met Gly Ser Ala Ser Arg Glu Asp 

355 360 365 

Asn Gly Val Ser Ala Arg Leu Val lie Gly Gly Asn Gly 
370 375 380 



WO 98/06836 



42 



PCI7US97/14450 



(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4377 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS t single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28; 

GAATTCGCCG CCGAGCTGGC CAAGCCGCTG GGCGCGGTGA CCGCACAGAA GGAAGTGGAG 60 

CGTGCCCTGC GOGACCTGCA CCTGCCCTTC GACGAGCGCC GTCCCTACGC CCTGCGCCGT 120 

CTGCGCGACC GCATCGAGGC CAATCTCTCC GGCCTGATGG GCCCCAGCGT GGCCCAGGAC 180 

ATGGTGGAAA CCTTCCTGCC CTACAAGGCC GGCAGCGAGG CCTATGTCAG CGAAGACATC 240 

CACTTCATCG AGAGTCGCCT GGAGGATTAC CAGTCGCGCC TCACCGGCCT GGCCGCCGAG 300 

QTCGACGCGC TGCGCCGCTT CCACCGCCAG ACCCTGCAGG AACTGCCGAT GGGCGTATGT 360 

TCGCTGGCCA AGGACCAGGA AGTGCTGATG TGGAACCGCG CCATGGAGGA ACTCACCGGC 420 

ATCAGCGCGC AGCAGGTGGT CGGCTCGCGC CTGCTCAGCC TGGAGCACCC CTGGCGCGAG 480 

CTGCTGCAGG ACTTCATCGC CCAGGACGAG GAGCACCTGC ACAAGCAGCA CCTGCAACTG 540 

GACGGCGAGG TGCGCTGGCT CAACCTGCAC AAGGCGGCCA TCGACGAACC GCTGGCGCCG 600 

GGCAACAGCG GCCTGGTGCT GCTGGTCGAG GACGTCACCG AGACCCGCGT GCTGGAAGAC 660 

CAGCTGGTGC ACTCCGAGCG TCTGGCCAGC ATCGGCCGCC TGGCCGCCGG GGTGGCCCAC 720 

GAGATCGGCA ATCCGGTCAC CGGCATCGCC TGCCTGGCGC AGAACCTGCG CGAGGAGCGC 780 

GAGGGCGACG AGGAGCTCGG CGAGATCAGC AACCAGATCC TCGACCAGAC CAAGCGCATC 840 

TCGCGCATCG TCCAGTCGCT GATGAACTTC GCCCACGCCG GCCAGCAGCA GCGCGCCGAA 900 

TACCCGGTGA GCCTGGCCGA AGTGGCGCAG GACGCCATCG GCCTGCTGTC GCTGAACCGC 960 

CATGGCACCG AAGTGCAGTT CTACAACCTG TGCGATCCCG AGCACCTGGC CAAGGGCGAC 1020 

CCGCAGCGCC TGGCCCAGGT GCTGATCAAC CTGCTGTCCA ACGCCCGCGA TGCCTCGCCG 1080 

GCCGGCGGTG CCATCCGCGT GCGTAGCGAG GCCGAGGAGC AGAGCGTGGT GCTGATCGTC 1140 

GAGGACGAGG GCACGGGCAT TCCGCAGGCG ATCATGGACC GCCTGTTCGA ACCCTTCTTC 1200 

ACCACCAAGG ACCCCGGCAA GGGCACCGGT TTGGGGCTCG CGCTGGTCTA TTCGATCGTG 1260 

GAAGAGCATT ATGGGCAGAT CACCATCGAC AGCCCGGCCG ATCCCGAGCA CCAGCGCGGA 1320 

ACCCGTTTCC GCGTGACCCT GCCGCGCTAT GTCGAAGCGA CGTCCACAGC GACCTGAGTA 1380 

GTGACCTAGA ACCGCCGAGG GGCCACAAGC CCGGCGGATT CGGAGACCGT CGAGAGAACA 1440 

CAATGCCGCA TATCCTCATC GTCGAAGACG AAACCATCAT CCGCTCCGCC CTGCGCCGCC 1500 

TGCTGGAACG CAACCAGTAC CAGGTCAGCG AGGCCGGTTC GGTTCAGGAG GCCGAGGAGC 1560 

GCTACAGCAT TCCGACCTTC GACCTGGTGG TCAGCGACCT GCGCCTGCCC GGCGCCCCCG 1620 

GCACCGAGCT GATCAAGCTG GCCGACGGCA CCCCGGTACT GATCATGACC AGCTATGCCA 1680 

GCCTGCGCTC GGCGGTGGAC TCGATGAAGA TGGGCGCGGT GGACTACATC GCCAAGCCCT 1740 

TCGATCACGA CGAGATGCTC CAGGCCGTGG CGCGTATCCT GCGCGATCAC CAGGAGGCCA 1800 

AGCGCAACCC GCCAAGCGAG GCGCCCAGCA AGTCCGCCGG CAAGGGCAAC GGCGCCACCG 1860 

CCGAGGGCGA GATCGGCATC ATCGGCTCCT GCGCCGCCAT GCAGGACCTT TACGGCAAGA 1920 

TCCGCAAGGT CGCTCCCACC GATTCCAACG TACTGATCCA GGGCGAGTCC GGCACCGGCA 1980 

AGGAGCTGGT CGGGCGTGCG CTGCACAACC TCTCGCGTCG CGCCAAGGCA CCGCTGATCT 2040 

CGGTGAACTG CGCGGCCATC CCCGAGACCC TGATCGAGTC CGAACTGTTC GGCCACGAGA 2100 

AAGGTGCCTT CACCGGCGCC AGCGCCGGCC GCGCCGGCCT GGTCGAAGCG GCCGACGGCG 2160 

GCACCCTGTT CCTCGACGAG ATCGGCGAGC TGCCGCTGGA GGCGCAGGCC CGCCTGCTGC 2220 

GCGTGCTGCA GGAGGGCGAG ATCCGTCGGG TCGGCTCGGT GCAGTCACAG AAGGTCGATG 2280 

TACGCCTGAT CGCCGCTACC CACCGCGACC TCAAGACGCT GGCCAAGACC GGCCAGTTCC 2340 

GCGAGGACCT CTACTACCGC CTGCACGTCA TCGCCCTCAA GCTGCCGCCA CTGCGCGAGC 2400 

GCGGCGCCGA CGTCAACGAG ATCGCCCGCG CCTTCCTCGT CCGCCAGTGC CAGCGCATGG 2460 

GCCGCGAGGA CCTGCGCTTC GCTCAGGATG CCGAGCAGGC GATCCGCCAC TACCCCTGGC 2520 

CGGGCAACGT GCGCGAGCTG GAGAATGCCA TCGAGCGCGC GGTGATCCTC TGCGAGGGCG 2580 

CGGAAATTTC CGCCGAGCTG CTGGGCATCG ACATCGAGCT GGACGACCTG GAGGACGGCG 2640 

ACTTCGGCGA ACAGCCACAG CAGACCGCGG CCAACCACGA ACCGACCGAG GACCTGTCGC 2700 

TGGAGGACTA CTTCCAGCAC TTCGTACTGG AGCACCAGGA TCACATGACC GAGACCGAAC 2760 

TGGCGCGCAA GCTCGGCATC AGCCGCAAGT GCCTGTGGGA GCGCCGTCAG CGCCTGGGCA 2820 

TTCCGCGGCG CAAGTCGGGC GCGGCGACCG GCTCCTGAAC GGGACGAACG GTGACAGGCC 2880 

TCGCCGCAAA AGGTTCCGCG CCTGTTACCC CGCACAAATA TCGCGTAACA AAAGCCGGGT 2940 
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TCATCGGTAA OGG6AACCC6 CTTTTTTCT GCCCGCCGCC CGCACCAAAA AATCATAACT 3000 

CATTGAAAAA CAAGGAATTA CAAAAACTGG CACGGCTTCT GCTTTATCTC TGGCACAACA 3060 

ACAATAACAA CGCTCGAAAC CTCAACAATA AAAACAATAC AGAACGACTC CAGCACAACA 3X20 

AAAACAACAA CGCGGAGGCG CAGCTAACTG ATTCTTTTGG AGAGGATTTG CCCTTGGGGT 3180 

TCGCCCCACA ACCAGGCCGA GAACAACAAA AACTGCACTA AAGCAGCGCC TGCACTGGTT 3240 

GGGTCATGGA ATGATCAAGG CAGCATCAGC ATCCAAAGCA ATCCGTTTGC TCCTGGTACC 3300 

CGATTTGGGC TACCTGAAAC GGGCCTACAA CAAAAACAAC AGGCCCGCAC AATAATAAAA 3360 

ACAAAGCACG CACCTATTTG GGGGGGAGCT TCGGCTCCCC CAGTAGCTTC ACCCCACCTC 3420 

GCGTTCCCCA GCCTGCCTTT TCCACCATCC CCCTTCCCGA TGCTAGAATC CGCGCCAATC 3480 

CTGCGGCGAT CTGCAATTGT GGCCGCCTAT TCCTGCAAAC AGTGCATCCC ATGCTGAAAA 3540 

AGCTGTTCAA GTCGTTTCGT TCACCTCTCA AGCGCCAAGC ACGCCCCCGC AGCACGCCGG 3600 

AAGTTCTCGG CCCGCGCCAG CATTCCCTGC AACGCAGCCA CTTCAGCCGC AATGCGGTAA 3660 

ACGTGGTGGA GCGCCT6CAG AACGCCGGCT ACCAGGCCTA TGTGGTCGGC GGCTGCGTAC 3720 

GCGACCTGCT GATCGGCGTG CAGCCCAAGG ACTTCGAOGT GGCCACCAGC GCCACCCCCG 3780 

AGCAGGTGCG GGCCGAGTTT CGCAACGCCC GGGTGATCGG CCGCCGCTTC AAGCTGGCGC 3840 

ATGTGCATTT CGGCCGC6AG ATCATCGAGG TGGCGACCTT CCACAGCAAC CACCCGCAGG 3900 

GCGACGACGA GGAAGACAGC CACCAGTCGG CCCGTAACGA GAGCGGGCGC ATCCTGCGCG 3960 

ACAACGTCTA CGGCAGTCAG GAGAGCGATG CCCAGCGCCG CGACTTCACC ATCAACGCCC 4020 

TGTACTTCGA CGTCAGCGGC GAGCGCGTGC TGGACTATGC CCACGGCGTG CACGACATCC 4080 

GCAACCGCCT GATCCGCCTG ATCGGCGACC CCGAGCAGCG CTACCTGGAA GACCCGGTAC 4140 

GCATGCTGCG OGCCGTACGC TTCGCCGCCA AGCTGGACTT CGACATCGAG AAACACAGCG 4200 

CCGCGCCGAT CCGCCGCCTG GCGCCGATGC TGCGCGACAT CCCTGCCGCG CGCCTGTTCG 4260 

ACGAGGTGCT CAAGCTGTTC CTCGCCGGCT ACGCCGAGCG CACCTTCGAA CTGCTGCTCG 4320 

AGTAOGACCT GTTCGCCCCG CTGTTCCCGG CCAGCGCCCG CGCCCTGGAG CGCGATC 4377 



(2) INFORMATION FOR- SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17612 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND BDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 29: 

GATCTCGAGG GCGTCGGCTT CGACACCCTG GCGGTGCGCG CCGGTCAGCA TCGCACGCCG 60 

GAGGGCGAGC ATGGCGAGGC CATGTTCCTC ACCTCCAGCT ATGTGTTCCG CAGCGCCGCC 120 

GACGCCGCCG CGCGCTTCGC CGGCGAGCAG CCGGGCAACG TCTACTCGCG CTACACCAAC 180 

CCGACCGTGC GCGCCTTCGA GGAGCGCATC GCCGCCCTGG AAGGCGCCGA GCAGGCGGTG 240 

GCCACCGCCT CCGGCATGGC CGCCATCCTG GCCATCGTCA TGAGCCTGTG CAGCGCCGGC 300 

GACCATGTGC TGGTGTCGCG CAGCGTGTTC GGCTCGACCA TCAGCCTGTT CGAGAAGTAC 360 

CTCAAGCGCT TCGGCATCGA GGTGGACTAC CCGCCGCTGG CCGATCTGGA CGCCTGGCAG 420 

GCAGCCTTCA AGCCCAACAC CAAGCTGCTG TTCGTCGAAT CGCCGTCCAA CCCGTTGGCC 480 

GAGCTGGTGG ACATAGGCGC CCTGGCCGAG ATCGCCCACG CCCGCGGCGC CCTGCTGGCG 540 

GTGGACAACT GCTTCTGCAC CCCGGCCCTG CAGCAGCCGC TGGCGCTGGG CGCCGATATG 600 

GTCATGCATT CGGCGACCAA GTTCATCGAT GGCCAGGGCC GCGGCCTGGG CGGCGTGGTG 660 

GCCGGGOGCC GTGCGCAGAT GGAGCAGGTG GTCGGCTTCC TGCGCACCGC CGGGCCGACC 720 

CTCAGCCCGT TCAACGCCTG GATGTTCCTC AAGGGCCTGG AGACCCTGCG TATCCGCATG 780 

CAGGCGCAGA GCGCCAGCGC CCTGGAACTG GCCCGCTGGT TGGAGACCCA GCCGGGCATC 840 

GACAGGGTCT ACTATGCCGG CCTGCCCAGC CACCCGCAGC ACGAGCTGGC CAAGCGGCAG 900 

CAGAGTGCCT TCGGCGCGGT GCTGAGCTTC GAGGTCAAGG GCGGCAAGGA GGCGGCCTGG 960 

CGTTTCATCG ATGCCACCCG GGTGATCTCC ATCACCACCA ACCTGGGCGA TACCAAGACC 1020 

ACCATCGCCC ATCCGGCGAC CACCTCCCAC GGTCGTCTGT CGCCGCAGGA GCGCGCCAGC 1080 

GCCGGTATCC GCGACAACCT GGTGCGTGTC GCCGTGGGCC TGGAAGACGT GGTCGACCTC 1140 

AAGGCCGACC TGGCCCGTGG CCTGGCCGCG CTCTGAGGAC GGGGGCCCCC GTTCCTGCCG 1200 

CGAAGGGCAG GGGCGGGGGC TTGCGGCGGG CCTTTGCGCG ATCAGCAGCT AGTCTTGGGG 1260 

AAACGTCCTA GCCCAGGAGC TACCCCATGA ACCTCATCCT TTTCCTGATC ATCGGCGCCG 1320 

TTGCCGGCTG GATCGCCGGC AAGTTGCTGC GTGGTGGCGG CTTCGGGCTG ATCGGCAACC 1380 



WO 98/06836 



PCT/US97/14450 



44 

TGGT66TGG6 CATAGTGGGC GCG6TGATC6 GOGGCCACCT GTTCAGCTAC CTGGGCGTGT 1440 

CCGCCGGTGG TGGGCTGATC GGCTCGCTGG TGACCGCGGT GATCGGTGCC CTGGTCCT6C 1500 

TGTTCATCGT CGGCCTGATC AAGAAGGCCC AGTAGCGCTG GCGGGACGCC GTCCCGCCGC 1560 

CCATCACTGG TCGCGCAGGT CCACGGCACC GGCGCCGGGT TTGTCGAACA GGCGCTCGGC 1620 

GCTGCCCGGC AGGCTGCTGT GGCCATCCTC GTCGGCACCC AGCACGCTGA TGTCGCTGTA 1680 

CTTCTTGCCC GACAGCGCGG CGATGCCGGC GCGGTCGCGG AOGATGGTCG GGCGCAGGAA 1740 

CACCATCAGG TTGCGCTTGA CGTGGGTGTC CTTGGTCGAG CGGAACAGCC GGCCGATCAG 1800 

OGGGATGTCA CCCAGCAGCG GCACCTTGGA GTCGGTGCTG GTGACGTCGT CCTGGATCAG 1860 

CCCTCCCAGC ACTATGACCT GGCCGTCGTC GGCCAGGATC ACGCTCTTGA TOGAGCGCTT 1920 

GTTGGTCACC AGGTCCACCG CCTGGGCATT GACCCCGGCG CTGGGGGGGA TGGAGGAGAT 1980 

CTCCTGC T CC ACTTCCAGGC GCAGGGTGGC GCCGTCGTTG ATGTGCGGGG TGACCTTGAG 2040 

GGTCACGCCG ATGTCCTCGC GCTCAATGGT GGTGAAGGGG TTGTTCGCCC COGAGGCGTC 2100 

GGTGGTGTAG GAGCCGGTCT GGAAAGGCAC CTTCTG CC OQ ACCAGGATTT CCGCCTCCTG 2160 

GTTGTCCAGG GTCAGCAGGC TGGGCGTGGA CAGCAGGTTG CTCTTGCTGT TGGCAGAGAG 2220 

GGCAGTGATC AGCGCGCCGA AGTTCTCGGT GCCGATGCCG ATGATGGCGC CGTCCGGCAG 2280 

GGTCAGGTCA TCGGGGATTT CCTCGTTCTG GATGGCCTTG AGCACGGTGC CCACCGATAG 2340 

CCCGGTATTG CCGAAGTTGA CCCCGCCGAG GCCGCCGGTG CCGCCGCGGG GATCCACCGC 2400 

CCACTGCACG COGAGGGCGT CGCTGATGTC CCCGGAGATT TCCACGATGG CCGCCTCGAC 2460 

CATCACCTGG GCGCGCGGGA OGTCGAGGTT GCGCACGATT TCCTCGAGGG TCGCCACGGT 2520 

GTCCGGATCG GCCAGCAGGA CCAGGGCATT GAGGCTCTCG TCGGCGOGGA TCAGGATGTT 2580 

CTGCGGCTTG CTGCTGGCGG CTTCGCCACC ACCCTCCGCG GTCTTCAACC CCTCGGAGAT 2640 

GTCGCCCAGG GTCTCGGCCA GGCTCTTGGC GTCGCTGTGG CGTAGGCGAA TTACCCGCGC 2700 

ATTGGCCGAA CGGGTGCTGG GGATGTCCAG CGAGCGGGCC AGGTTGGCCA GGCGCTGGCG 2760 

GGCGGCCGGC GGGCCGAGGA GGATGAGGCG GTTGGTGCGG GCGTCGGCAA TCACCCGGGT 2820 

GCOGGCGCTG TTTTTCTCGT TGCGCATCAC CGCGTTGTTC AGTGCCTCGG CGGCGTCCAG 2880 

TACCCAGGCA TGCTGCAGGT TGATCACGTT GTAGTCGCCG CCGCCCTGGG CATCGAGCTC 2940 

GGCGATCAGT TCGCGGATGC GTTCGATATT HGCCCGGCGG TCGCTGATGA TCAGCGCGTT 3000 

GGAGGCGGCG ACCGCOGCCA GGTGGCCGTT CTGCGGCACC AGCGGGCGGA TCAGCGGGAT 3060 

CAGTTCGTTG ACCGAGGTGT GCTGCACCTG GATCAGCTCG GTCTGCACAT CGTCCGGCGC 3120 

GCTGOGGCTG CTGTTGGCGC CGCTACGCGC CTCGGTGACC GGCACGATGC GCGCCTGGTC 3180 

GCCCTGTGCC AGCACGCTGA AGCCATGGGT GCTCATCACC GAAAGGAACA GCTGGTAGAC 3240 

CTCCTCGAGG CCCAGCGGGG TCTTGGAGAT CACCGTGACC TGGCCCTTGA CCCGCGGATC 3300 

GACGACGAAG GTCTCGCCAG AGATCTGCGC CACCTGGTCG ATGAAGTCGC GGATATCGGC 3360 

GTCCTTCATG TTGATGGTCC AGGTCTCGGC GCCCTGGCTC ACCGCCACCG GCTCGGCGGC 3420 

ATGGACGAGC GGCAGCGGGG CGGCGAGGCA GCTCGCGGCC AGCAGCAGGG CGAGGGGCAG 3480 

GCGTTTGTGC GGCGGAATTC TGGAGTCGAT CATGGGCTGT CTTCGGCTTC CGGTATTTCG 3540 

GGCTGCGGGA TGTCGCCGCC TTCCATGCGT TGTTGAAGGG TCTGGATGCG CTCCTGCAGG 3600 

GCCTGGACGT CTTCGTCCTG CAGCTGTTCC AGTTGGCTGG CGGTGGGCTC CAGCGCOGAG 3660 

TAGGCCGGCG TCAGAGAGGG CTGGCGCACG GCGGGGAAGC GCAGGCTCTC CTCGACGCCG 3720 

CCGCGGTCGA GCACCACGTG GTCCTGATAG ACGGCCTGCA GGCGGGTGCT GACGTTGACC 3780 

GATTCGCCCA CGGCGATGCG CTTGGGTTTG TCGCCGGCGA CCTGGATGAT CGCCGTGGAG 3840 

CGCTTGGCGT CCGGGTTGAC GAAGCTGGCC AGCAGGGTCA TCTGCTGCCG GGTGGOGGGG 3900 

GCGGCCTGGT CGCCGCGCGG CCTGGCCGCG GGCGTGCCGA ACAGATGCTG CAGGCGCTGG 3960 

ATGGACAGCG GCTGGCGCTC GGCGATGCTC TCTGGGGCGG GCGGTGGCGC GGCCTCGCTG 4020 

CGCAGCAGGC GAAGGAAGTC GATGCTCTGC TTGCTCAGGC TGAGGGTGAT GAGCAGCACC 4080 

ACGAGCAGGC AGAGGCCGGT CACGCCGTGG CGCTGCAGCC AGGCGGGCAG GCGGGTGCGG 4140 

GTGCTACTCA AGGCATGGTT CCCCCGGTGT TCTTCTTATT CTGTGCGGAC GCTCTGCTCG 4200 

GCGTCTCGCA ATCCGGCCCG TACTCTGCGG GCGCAGGCAA CCTTAACGCA AGTCTCCTGT 4260 

CCATGGCGCA CCTGCTTCGT CTATCTGCGC GCTGGCGCAC TGTCCGCCGC TGCCGGAAGC 4320 

GTGAAACATT TCGAAACTTT CGGCGAACGA GTCGCTATCA TCGGCCCCAC GCGCTTCCCG 4380 

TTCAACAATA GCAATAAGCC AGACGGATTA CCGCCATGGA AGATCGCAAG CCGCCTGCCG 4440 

CGGCTCCCGT GGGGTTTGOG CGCGCGGAGC TGCTGGAGCT GCTCTGCCGC TGCGAGCAGT 4500 

TTCCCCTGAC CCTGCTGCTG GCGCCCGCCG GTTCCGGCAA GTCGACCCTG CTGGCCCAGT 4560 

GGCAGGCCAG CCGGCCCTTC GGCAGTGTGG TGCACTATCC ACTGCAGGCG CGTGACAACG 4620 

AGCCGGTACG CTTCTTCCGC CACCTGGCCG AAAGCATCCG CGCCCAGGTC GAGGACTTCG 4680 

AOCTGTCCTG GTTCAACCCC TTCGCCGCCG AGATGCACCA GGCGCCCGAG GTGCTCGGCG 4740 

AGTACCTGGC CGACGCCCTC AATCGCATCG AGAGCCGCCT CTACCTCGTC CTCGACGACT 4800 

TCCAGTGCAT CGGCCAGCCG ATCATCCTCG ACGTGCTCTC GGCCATGCTC GAACGCCTGG 4860 

CGGGCAACAC CCGGGTCATT CTGTCCGGGC GCAACCATCC GGGGTTCTCC CTCAGCCGCC 4920 
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TGAAACTGGA CAACAAGCTG CTGTGCATCG ACCAGCACGA CATGCGCCTG TCGCCAGTGC 4980 

AGATCCAACA CCTCAATGCC TACCTGGGCG GTCCCGAGCT CAGCCCGGCC TATGTCGGCA 5040 

GCCTGATGGC CATGACCGAG GGCTGGATGG TCGGGGTGAA GATGGCCCTG ATGGCCCATG 5100 

CGCGCTTCGG CACCGAGGCC CTGCAGCGCT TCGGTGGCGG CCATCCGGAG ATAGTCGACT 5160 

ACTTCGGCCA TGTGGTGCTG AAGAAGCTGT CGCCGCAGCT GCACGACTTC CTGTTGTGCA 5220 

GC6CGATCTT CGAGCGCTTC GACGGCGAGC TATGCGACCG GGTGCTGGAT OGCA6CGGTT 5280 

CGGCCCTGCT GCTGGAGGAC CTGGCCGCGC GCGAGCTGTT CATGCTGCCG GTGGACGAGT 5340 

ATCCCGGCTG CTACOGCTAC CACGCCCTGT TGCACGATTT CCTCGCCCGG CGCCTGGCCG 5400 

TGCACAAGCC ACAGGAAGTG GCGCAACTGC ACCGGCGGGC GGCCCTGGCG CTGCAGCAGC 5460 

GTGGCGACCT GGAGCTGGCC CTGCAGCATG CCCAGCGCAG TGGCGACCGC GCGTTGTTCC 5520 

AAAGCATGCT GGGCGAGGCC TGOGAGCAAT GGGTGCGCAG CGGTCACTTC GCCGAGGTGC 5580 

TGAAGTGGCT GGAGCCGCTG AGOGAGGOGG AACTCTGOGN GCAGTCGCGC CTGCTGGTGC 5640 

TGATGACCTA TGCCCTGACC CT6TOGCGGC GTTTCCACCA GGCGCGCTAC TGCTTGGACG 5700 

AACTGGTGGC GCGCTGCACC GGTCAGCCGG GCCTGGAGGA GCCGACCCGC CAGCTGCTGG 5760 

OGCTCAACCT GGAGCTGTTC CAGCAOGACC TCGCCTTCGA CCCCGGCCAG CGCTGGTCCG 5820 

ACCTGCTGGC OGCGGGOGTC GCCTCGGACA TCCGTGCCCT GGCGCTGAGC ATCCTCGCCT 5880 

ATCACCACCT GATGCAOGGC OGCCTGGAGC AGTCGATCCA GCTGGCGCTG GAGGCCAAGG 5940 

CGCTGCTGGC CAGCACOGGC CAGCTGTTCC TGGAGAGCTA CGCCGACCTG ATCATCGCCC 6000 

TGTGCAACCG CAACGCCGGG CGCGCCACCA GCGCGCGCAA GGACGTCTGC CTGGATTACC 6060 

AGCGCACCGA GCGCTCCTCG CCGGCCTGGG TCAACCGTGC CACCGCCATG GTGGTGGCGC 6120 

TGTACGAGCA GAACCAGCTG GCCGCCGCCC AGCAGCTGTG CGAGGACCTG ATGGCCATGG 6180 

TCACGTCGTC CTCGGCCACC GAGACCATCG CCACCGTGCA CATCACCCTG TCGCGCCTGC 6240 

TCCACCGGCG CCAGTCCCAG GGCCGCGCCA CGCGCCTGCT GGAGCAGCTG TCGCGCATCC 6300 

TGCAACTGGG CAACTACGCC OGCTTCGCCA GCCAGGCGGC GCAGGAGAGC ATGCGCCAGG 6360 

CCTATCTCGA OGGGCGCCCG GCGGCGCTCG ACGCACTGGC CCAACGCCTG GGTATCGAGG 6420 

AGCGCCTGGC CGCCGGGGAG TGGGAGAGGG TGCGGCCCTA TGAAGAGTGC TGGGAAC6CT 6480 

AOGGCCTGGC CGCCGTGTAC TGGCTGGTGA TGCGCGGCGC CCAGCCGCGC GCCTGCCGCA 6540 

TCCTCAAGGT GCTGGCGCAG GCGNTGNAGA AGAGCGAGAT GAAGGCCCGT GCGCTGGTGG 6600 

TGGAGGCCAA CCTGCTGGTG CTGAACGCCC CGCAGCTGGG GGCGGACGAG CAGGACAGGG 6660 

CCCTGCTGGC GCTGGTCGAG CGCTTCGGCA TCGTCAACAT CAACCGCTCG GTATTCGACG 6720 

AGGCGCCCGG CTTCGCCGAG GCGGTGTTCG GCCTGCTGCG CTCGGGCCGG CTGCAGGCGC 6780 

CGGAGGCCTA TCGC6AGGCC TATGCCGACT TCCTCCAGGG CACAGGCCAG GCGCCGCCGG 6840 

CGCTCCTGTC CGAGTCGCTG AAACAGCTTA CCGACAAGGA GGCGGCGATC TTCGCCTGCC 6900 

TGCTCAGGGG GCTGTCCAAC AGCGAGATCA GCGCCAGCAC CGGCATCGCC CTGTCCACCA 6960 

CCAAGTGGCA CCTGAAGAAC ATCTACTCGA AGCTGAGCCT CTCCGGGCGT ACCGAAGCCA 7020 

TCCTCGCCAT GCAGGCCCGC AACGGATAAT GCGCCATGCC CCTCCCCGGG GAGGGGGGAG 7080 

GGGCGCGCGC AACTGCTTAA TCTCCCGCCT GCOGGAAAAG CCGGCAAGCA ACCCCATTAG 7140 

TACAAGAAGA AATOGGGAGA TATCGCCATG TCTGTTTGGG TCACGTGGCC GGGCTTGGTC 7200 

AAGTTCGGCA CCCTGGGCAT CTATGCCGGC CTGATCACGC TCGCGCTTGA GCGCGACGTG 7260 

CTGTTCAAGA ACAACCTGTT CGACGTCGAC AACCTGCCCG CGGCCAACGC CAGCATCACC 7320 

TGTGATGCCC GCAGCCAGGT GGCGCGTACC GAGGACGGCA CCTGTAACAT CCTCGCCAAC 7380 

CCGGCCGAGG GCTCGGTGTA CCGCCGCTTC GGGCGCAACG TCGACCCCAG CGTGACCCAT 7440 

GGCGAGACCG AGGCCGACAC CCTGCTCAGT CCCAATCCGC GGGAGGTGAG TAACGTGCTG 7500 

ATGGCGCGTG GCGAGTTCAA GCCGGCGCCC AGCCTCAACT TCATCGCCGC CTCCTGGATC 7560 

CAGTTCATGG TGCATGACTG 6GTCGAACAC GGCCCCAACG CCGAAGCCAA CCCGATCCAG 7620 

GTGCCGCTGC CGGCTGGCGA CGCGCTCGGC TCCGGCAGCC TGTCCGTGCG CCGCACCCAG 7680 

CCCGACCCGA CCCGTACCCC GGGCGAGGCC GGCAAGCCGG CCACCTACCG CAACCACAAC 7740 

ACCCACTGGT GGGATGGCTC GCAGTTGTAT GGCAGCAGCA AGGACATCAA CGACAAGGTG 7800 

CGCGCCTTCG AGGGTGGCAA GCTGAAGATC AATCCCGACG GTACCCTGCC GACCGAGTTC 7860 

CTCAGCGGCA AGCCGATCAC CGGCTTCAAC GAGAACTGGT GGGTTGGCCT GAGCATGCTG 7920 

CACCAGCTGT TCACTAAGGA GCACAACGCC ATCGCGGCGA TGCTCCAGCA GAAGTACCCG 7980 

GACAAGGACG ACCAGTGGCT GTACGACCAT GCGCGCCTGG TCAACTCCGC GCTGATGGCC 8040 

AAGATCCACA CCGTGGAATG GACCCCGGCG GTGATCGCCA ACCCGGTCAC CGAACGCGCC 8100 

ATGTATGCCA ACTGGTGGGG CCTGCTGGGT TCCGGTCCGG AGCGTGACAA GTACCAGGAA 8160 

GAGGCGCGCA TGCTGCAGGA GGACCTGGCC AGCTCCAACT CCTTCGTCCT GCGCATTCTC 8220 

GGCATCGACG GCAGCCAGGC CGGCAGTTCG GCCATCGACC ATGCCCTGGC CGGCATCGTC 8280 

GGCTCGACCA ACCCGAACAA CTACGGCGTG CCCTACACCC TGACCGAGGA GTTCGTCGCG 8340 

GTCTACCGCA TGCACCCGCT GATGCGCGAC AAGGTCGATG TCTACGACAT CGGCTCGAAC 8400 

ATCATCGCGC GCAGCGTGCC GCTGCAGGAG ACCCGCGATG CCGACGCCGA GGAGCTGCTG 8460 
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GCGGACGAGA ATCCCGAGCG CCTGTCGTAC TCCTTCGGCA TCACCAACCC GGGCTCGCT6 8520 

ACCCTCAACA ACTACCCGAA CTTCCTGCGC AACCTGTCCA TGCCGCTGGT CGGCAACATC 8580 

6ACCTGGC6A CCATOGACCT GCTGTGTGAC CGCGAGCGCG GGGTGCCGCG CTACAAC6AG 8640 

TTCCGCCGOG AGATCGGCCT CAACCCGATC ACCAAGTTGG AGGACCTGAC CACC6ACCCG 8700 

GCCACCCTG6 CCAACCTCAA GCGCATCTAC GGCAAC6ACA TCGAGAAGAT TGACACCCTG 8760 

GTOGGCATGC TGGCOGAGAC CGTGCGTCCG GACGGCTTCG CCTTCGGOGA GACGGCCTTC 8820 

CAGATCTTCA TCATGAACGC CTOGCGGGGC CTGATGACCG ACCGCTTCTA TACCAAGGAC 8880 

TACCGCCCGG AGATCTACAC CGCOGAGGGC CTGGCCTGGG TCGAGAACAC CACCATGGTC 8940 

GACGTGCTCA AACGCCACAA TCCGCAGCTG GTCAACAGCC TGGTTGGCGT GGAAAACGCC 9000 

TTCAAACCCT GGGGCCTGAA CATCCCGGCC GACTACGAGA GCTGGCCGGG CAAGGCCAAG 9060 

CAGGACAACC TGTGGGTCAA CGGCGCCNTG CGCACCCAGT ACGCCGCAGG CCAGCTGCCG 9X20 

GCCATTCCGC CGGTGGACGT CGGCGGCCTG ATCAGTTOGG TGCTGTGGAA GAAGGTGCAG 9180 

ACCAANTCOG ACGTGGCGCC GGCCGGCTAC GAGAAGGCCA TGCACCCGCA TGGCGTGATG 9240 

GCCAAGGTCA AGTTCACCGC CGTGCOGGGG CACCCCTACA CCGGCCTGTT CCAGGGTGCC 9300 

GACAGCGGCC TGCTGOGCCT GTCGGTGGCC GGCGACCOGG CAACCAACGG CTTCCAGCCG 9360 

GGTCTGGCGT GGAAGGCCTT CGTCGACGGG AAGCOGTCGC AGAACGTCTC CGCGCTCTAC 9420 

ACCCTGAGCG GGCAGGGCAG CAACCACAAC TTCTTCGCCA ACGAGCTGTC GCAGTTCGTC 9480 

CTGCCGGAGA CCAACGATAC CCTGGGCACC ACGCTGCTGT TCTCGCTGGT CAGCCTCAAG 9540 

CCGACCTTGC TGOGCGTGGA CGACATGGCC GAAGTGACCC AGACCGGCCA GGCCGTGACT 9600 

TCGGTCAAGG CGCCGACGCA GATCTACTTC GTGCCCAAGC CGGAGCTGCG CAGCCTGTTC 9660 

TCCAGTGCGG CGCATGACTT CCGCAGCGAC CTGACGAGCC TCACCGCCGG CACCAAGCTG 9720 

TACGACGTCT ACGCTACCTC GATGGAGATC AAGACCTCGA TCCTGCCGTC GACCAATCGT 9780 

AGCTACGCCC AGCAAOGGCG CAACAGCGCG GTGAAGATCG GCGAGATGGA GCTGACCTCG 9840 

CCGTTCATCG CCTCGGCCTT CGGCGACAAC GGGGTGTTCT TCAAGCACCA GCGTCACGAA 9900 

GACAAATAAG GGTCATCCCT TGCTGAACAG CCCCGGCCCG TGCCGGGGCT TTTTTGTGCA 9960 

CGCCTTACGT CCATCACACT TCTGCGCCAG GCTGTGCTGC CGCCTGCAAA ATCGGGACTG 10020 

CAGTTTTTGC GCAAATCCGT TAACTTGGCG CCTCGGCCAT GCCATAAAAA CAACAAGAAC 10080 

AACAGCAAGA TGGATCTTCT GTTCGGGGAA CGCATCCGCC CATGTCCACC GATACCCACG 10140 

CCGCCCTGAC GGCTCCOGCA AGCCCCGCCT TGCGCCCGCT GCCCTTCGCC TTCGCCAAAC 10200 

GCCACGGCGT GCTGCTGCGC GAGOCCTTCG GCCAGGTCCA GCTGCAGGTG CGCCGCGGTG 10260 

CCAGCCTGGC CGCOGTGCAG GAGGCCCAGC GCTTCGCCGG CCGCGTGCTG CCGCTGCACT 10320 

GGCTGGAGCC OGAGGCCTTC GAGCAGGAGC TGGCCCTGGC CTACCAGCGC GACTCCTCCG 10380 

AGGTGCGGCA GATGGCCGAG GGCATGGGTG CCGAACTTGA CCTAGCCAGC CTGGCCGAAC 10440 

TCACTCCCGA ATCCGGCGAC CTGCTGGAGC AGGAAGATGA CGCGCCGATC ATCCGCCTGA 10500 

TCAACGCCAT CCTCAGCGAG GCGATCAAGG CCGGOGCCTC GGACATCCAC CTGGAAACCT 10560 

TCGAGAAACG CCTGGTGGTG CGCTTTOGCG TCGACGGCAT CCTCCGCGAA GTGATCGAAC 10620 

OGCGCCGCGA GCTGGCGGCG CTGCTGGTCT CGCGGGTCAA GGTCATGGCG CGCCTGGACA 10680 

TCGCCGAGAA GCGCGTACCG CAGGACGGCC GTATTTCGCT CAAGGTCGGC GGTCGCGAGG 10740 

TGGATATCCG CGTCTCCACC CTGCCGTCGG CCAACGGCGA GCGGGTGGTG CTGCGTCTGC 10800 

TCGACAAGCA GGCCGGGCGC CTGTCGCTCA CGCATCTGGG CATGAGCGAG CGCGACCGCC 10860 

GCCTGCTOGA CGACAACCTG CGCAAGCCGC ACGGCATCAT OCTAGTCACC GGCCCCACCG 10920 

GCTCGGGCAA GACCACCACC CTGTACGCCG GCCTGGTCAC CCTCAACGAC CGCTCGCGCA 10980 

ATATCCTCAC GGTGGAAGAC CCGATCGAGT ACTACCTGGA AGGCATCGGC CAGACCCAGG 11040 

TCAACCCGCG GGTGGACATG ACCTTCGCCC GCGGCCTGCG CGCCATCCTG CGCCAGGACC 11100 

CGGACGTGGT GATGGTCGGC GAGATCCGCG ACCAGGAGAC CGCCGACATC GCCGTGCAGG 11160 

CCTCGCTCAC CGGCCACCTG GTGCTCTCCA CCCTGCACAC CAACAGCGCC GTCGGCGCOG 11220 

TCACCCGCCT GGTGGACATG GGCGTCGAGC CCTTCCTGCT GTCGTCGTCC CTGCTCGGCG 11280 

TGCTGGCCCA GCGCCTGGTG CGCGTGCTCT GCGTGCACTG CCGCGAGGCG CGCCCGGCTG 11340 

ACGCGGCCGA GTGCGGCCTG CTCGGCCTCG ACCCGCACAG CCAGCCCCTG ATCTACCACG 11400 

CCAAGGGCTG CCCGGAGTGC CACCAGCAGG GCTACCGCGG CCGTACTGGC ATCTACGAGC 11460 

TGGTGATCTT OGACGACCAG ATGCGCACCC TGGTGCACAA CGGCGCCGGT GAGCAGGAGC 11520 

TGATTCGCCA CGCCCGCAGC CTCGGCCCGA GCATCCGCGA CGATGGCCGG CGCAAGGTGC 11580 

TGGAAGGGGT GACCAGCCTG GAAGAAGTGT TGCGCGTGAC CCGGGAAGAC TGATGGCCGC 11640 

CTTCGAATAC ATCGCCCTGG ATGCCAGGGG CCGCCAGCAG AAGGGCGTGC TGGAGGGCGA 11700 

CAGCGCCCGC CAGGTGCGCC AGCTGCTGCG CGACAAACAG TTGTCGCCGC TGCAGGTCGA 11760 

GCCGGTACAG CGCAGGGAGC AGGCCGAGGC TGGTGGCTTC AGCCTGCGCC GTGGCCTGTC 11820 

GGCGCGCGAC CTGGCGCTGG TCACCCGTCA GCTGGCGACC CTGATCGGCG CCGCGCTGCC 11880 

CATCGAGGAA GCGCTGCGCG CCGCCGCCGC GCAGTCGCGC CAGCCGCGCA TCCAGTCGAT 11940 

GCTGTTGGCG GTGCGCGCCA AGGTGCTCGA GGGCCACAGC CTGGCCAAGG CCCTGGCCTC 12000 
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CTACCCGGCG 
GCACCTGGCG 
GCAGAAGATC 
CGTOGGTTTT 
GCAGACCCTG 
GGGCGCCCTG 
GAGCGAGGAT 
GCTGATCGCC 
06GCGTGCCA 
CCGCAGCGAC 
6CTGGAAGCC 
TTCOGGC6AG 
GGCCACCATC 
GGTG6TGCT6 
66GTTGATAG 
GTCATCCTCG 
GCGAAGGTCA 
AAGCTGGACA 
CCCACCGGCA 
GTCGAOCCCT 
CTGTATTCGC 
AACTGGGATC 
GTGGTGCTGG 
AGCAGCCCCG 
CTGCTCGACG 
TACCGGGTGC 
CACGAGCTGC 
CCCGCCGCCC 
TCCAGTGGCG 
COGGTGCTGA 
CGATGAAGCG 
CCGTGGTCGC 
TGGAGGACAA 
CCGACGTGCC 
GGCTGTGGCA 
GGGTGGCGCT 
CCCTGAGTGG 
GCTGATCGCC 
CGTGCTGCAG 
GGCCATGGCA 
GCTGGGCGAC 
CACCCGCAGC 
CTGGCAGCTC 
GGACAGCCAG 
TCTCGACCAG 
GGCCCTGACC 
ACTGCGCCGT 
GGGCGAGCAG 
CGCGGCGTGG 
GCCCTGCTGC 
CAGGCCCAGT 
CTGGCCGCCG 
CCCCTGGATG 
CTCAACAGCC 
CTGCTGCAGC 
GGCGATCAGG 
CCGCCCTACC 
GGCATGAGCG 
GTCGAGCTGA 



GCCTTCCCCG 
CCGGTGCTGG 
CAGATGGCGC 
CTGCTCGGCT 
COGGCGCTGA 
GCCATCGTCC 
CTGCGCCGGC 
GCCACCGAGA 
CTGGTGGAGG 
GTGGCCAACG 
AGCCGGCAGT 
CTGGACCAGA 
GGCCTGCTGG 
GTGATCGTGC 
CGATGTACAA 
GCATTCTCGC 
CCGCGGCGCA 
ACCAGAACTA 
OGCCGGOGGC 
GGGGCAACCA 
TGGGCGCCGA 
TCTGACTCGC 
TGCTGCTGGG 
CGCGCAAGCT 
AGGCGGTGCT 
TGCGCTTCGA 
CGGAGTGGCT 
GTGGCGAGCA 
AGCTGACCCC 
CGCTGGCCAG 
CGGCCGCGGC 
CGCCAGCGTG 
GACCTTCGCC 
GCCGGGCGAG 
GAGCGAGGTG 
GCGGCCGGAG 
CTTCGTCGGG 
ATCGCCATCT 
ACCGATCGTG 
GCTTTCGAAC 
CTGCTGCCAG 
GGCTGGCGCA 
GAAGGCGAGC 
CCGCGGGTGC 
GAGGGGCGCT 
CAGCTGCCGC 
CTCTGGCGCT 
GGCGGTGAGC 
CACTGATCAC 
TGCGCCAGCA 
ACTACGCCGA 
ACCAGGTCGA 
AGGGCGGOGA 
TGGCCGCCGG 
TGCTGCAGCT 
AGGCCAGCGG 
GTACCGGCCC 
AGGCCGACTA 
ACATCAACAC 



AGCTGTACCG 
AGCAGCTGGC 
TGCTCTACCC 
ACGTGGTGCC 
CCCGCGGGCT 
TGGCGGTGCT 
GCTGGCATGC 
CGGCACGCTT 
CGCTGGCCAT 
CCACCCAGCG 
TTCCGCCGAT 
TGCTGGOGOG 
TGGGGCTGTT 
TGGCCATCCT 
ACAGAAAGGC 
TGCCCTGGTG 
GAACGACATC 
CCCGAGCACC 
GAAGAACTGG 
GTACCTGTAC 
CGGCCAGGAA 
AATGCAGCGG 
CGTGCTCACC 
GGCGGACGAG 
GGACAACCGC 
GCCGCGCACG 
OGAGCTGGAG 
GGACAAAGCC 
CTTCGCCCTG 
CGAOGGCTTC 
TTCACCCTGC 
CTCAGCGCCA 
ACCTGGCTGG 
GGCCGCGAGC 
CAGGCCACCA 
CGCGGGCTGC 
GTCGAGCCAT 
TCGCCCTGCT 
GCCAGCGCCA 
GCGACCTGCT 
CCCTGCGCGG 
ACCCGCTCGG 
GCTGGCAGCG 
AGCAGGCGCT 
GGCTGCAGGA 
GTGCCGTCGA 
TGCCCGAGAT 
TGCTGCCGGA 
CGTGCTGCTG 
GCTGGCCATC 
AGGCGGCGAG 
TCATCCCGGC 
GCTGCGCCTG 
TGGTGAGGCC 
GACCCCGGCC 
CATGGCCGGC 
CGGGCGCATT 
CCGCCGCCTG 
CGCCAGCGCC 



CGCCACGGTG 
CGACTACACC 
GGTGATCCTG 
GGATGTGGTG 
GATTTTCCTC 
CGGCGTGCTC 
CTTCCTGCTG 
CGCCTCGACC 
CGGCGCCGAG 
CGTGCGCGAG 
GATGCTGCAC 
CACGGCGCGC 
CGAGCCGTTC 
GCTGCCGATT 
TTCACGCTGA 
GTGCCGCAGG 
CGCGCCATCG 
CAGCAGGGCC 
AACGCCGAGG 
CTGTCGCCGG 
GGCGGCGAGG 
GGGCGCGGTT 
GGCCTCGCCG 
GCCGAGCGCC 
GAGTATGGCG 
GCGCGCTGGG 
ATCGAGGTCG 
GCGGCCAAGG 
CGCCTGTCCG 
GCCGAGCCCG 
TCGAGGTGCT 
GCGCTCGCTC 
CGGACAACCG 
AGGGCGAGGA 
GCGAGCCGGA 
AGGGCAAGAT 
GAGGCAGCGC 
GGCCATGGCC 
GCAGGAGCAG 
GCAGGTGCGC 
CAGCAGTGGC 
CCAGCCGCGC 
CGCTTACTGG 
GGATGGCGTG 
CTGGCCGCCG 
GCTGGTCGTC 
GCCGCAGCAG 
AGAGCCGGAG 
GTGGTGGCGC 
CGCAGCACCG 
CTGCTGGCCA 
GAGCCCTGGG 
CGCATCGAGG 
GGTGAGTTGG 
TATGCCGAGC 
GCCGAGGATG 
GCCGAGGTGT 
GCCCCCTTCG 
CTGGTGCTGG 



GCGGCCGGCG 
GAGCAGCGCC 
ATGCTCGCTT 
CGGGTGTTCG 
AGCGAGCTGG 
GCCTTTCGCC 
CGCGTGCCGC 
CTGGCCATCC 
GTGGTGTCCA 
GGCGGCAGCC 
ATGATCGCCA 
AACCAGGAAA 
ATGCTGGTAT 
CTTTCTCTGA 
TCGAAATCAT 
TGATGGGCCG 
GCGCCGCGCT 
TGGAGGCCCT 
GCTACCTGAA 
GCACCCGCGG 
GGACCGACGC 
TCACTCTGAT 
TGCTCGGCAG 
TGCAGTCGCT 
TACGCTTCGA 
AGCCGCTCGA 
ACGAGCAGAG 
CGCCACAGCT 
CCGGCCGCGA 
AGCTGCAGCA 
GGTGGCCCTG 
GCTGAAGACC 
CCTGCAGGAG 
GAGCTACGCC 
GATGCTGCGT 
CGAAGACCAT 
GGCTTCACCC 
ACCTACCGCA 
CGTCTGCGCG 
CTGCGTCCGG 
CGCGACACCC 
GCCACCCTAC 
ACGGTGCTGG 
CGCCGCTTCG 
GCCAACAGTG 
GAGCACCGCC 
GAACAGATCA 
CCCGAGGCAT 
TGGTGACCGT 
GCAACCAGCT 
AGGCCCTGCT 
CCAACCCCGG 
ACCTGGCCGG 
CGCTGCTGCG 
GCCTGCAGGA 
ACCAGTACCT 
CGGAGCTGCG 
TCAGCGCCCT 
CTTGCCTGGG 



AGCATGCGGG 
AGCAGTCGCG 
CGCTGGGCAT 
TCGACTCCGG 
TCAAGTCCTG 
GCGCCTTGCG 
TGGTCGGTGG 
TGGTGCGCAG 
ACCTGATCAT 
TGTCGCGCGC 
GCGGCGAGCG 
ACGACCTGGC 
TCATGGGCGC 
ACCAACTGGT 
GGTGGTGGTG 
CCCGGACCAG 
GGACATGTAC 
GGTGAAGAAA 
GAAGCTGCCG 
CAAGATCGAC 
CGACATCGGC 
CGAGCTGCTG 
CGGGATCGCC 
GCTGCGGGTG 
CGCCCGGAGC 
CGAGCGCGTG 
TGTCGGGCTG 
GCTGCTGCTC 
GCGCGGCGCG 
GGAAAAGTCC 
GCGATCTTCG 
GCCGCGCGCC 
CTGCAGCTGG 
GGGCGGCGCT 
GTCACCGTAC 
GCCCTGGTGA 
TGCTGGAAGT 
TGCTCGACAG 
AGCTGACGCG 
TGCGCGACCC 
AGCTGGAGTT 
AGCGGGTGCG 
ACCAGGCCCA 
ACTTGCGCTT 
CTGCCGACGA 
ATTACGGTGA 
CGCCGCCCGG 
GAGCCGGCAG 
GGTCTGCGCG 
GCTGGTGCGC 
GCGTCGCGAC 
CCTGCGCTTC 
ACGTTTCAAC 
CCTGCGGCGC 
CTGGCTCGAC 
GCTGCAGAAA 
CCTGCTGCTG 
GCCGAGCCAG 
CGAGGGCATN 



12060 
12120 
12180 
12240 
12300 
12360 
12420 
12480 
12540 
12600 
12660 
12720 
12780 
12840 
12900 
12960 
13020 
13080 
13140 
13200 
13260 
13320 
13380 
13440 
13500 
13560 
13620 
13680 
13740 
13800 
13860 
13920 
13980 
14040 
14100 
14160 
14220 
14280 
14340 
14400 
14460 
14520 
14580 
14640 
14700 
14760 
14820 
14880 
14940 
15000 
15060 
15120 
15180 
15240 
15300 
15360 
15420 
15480 
15540 
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CCCGAGGCGG TGCTCGAGGC OGCCATCGAN GCTCGCGGCC GCAGCGGCTA TCGCGAGCCC 15600 

GCTGCCTTCG TCCAGCANCT TGCCAGCTAC GGCGTCAGCC C6CA6GGGCT GG6CATC6CC 15660 

AGCCAGTATT TCCGTGTCAC CACCGAGGTG CTGCTGGGTG AGCGGCGCCA GGTGCTGGCC 15720 

AGTTATCTGC AACGTGGTAA TGATGGGCGC GTCCGCCTGA TGGCGC6C6A TCTGGGGCAG 15780 

GAGGGCCTGG CGCCCCCACC CGTCGAGGAG TCCGAGAAAT GAGTCTGCTC ACCCTGTTTC 15840 

TGCCGCCCCA GGCCTGCACC GAGGCGAGCG CCGACATGCC GGTGTGGTGC GTCGAGA6CG 15900 

ACAGCTGCCG TCAGCTGCCC TTO6CCGA0O CCTTCCCGGC CGACGCGCGG GTCTGGCGCT 15960 

TGGTGCTGCC GGTGGAGGOG GTGACCACCT GTGTCGTGCA GTTGCCGACC ACCAAGGCAC 16020 

GCTGGCTGGC CAAGGCCCTG COGTTCGCCG TCGAGGAGCT GCTGGCCGAG GAGGTGGAGC 16080 

AGTTTCACCT GTGOGTOGGT AGCGOGCTGG TCGATGGTCG TCATCGTGTT CATGCCCTGC 16140 

GCCGCGAGTG GCTGGCCGGC TGGCTGGCGC TGTGCGGCGA GCGGCCGCCG CAGTGGATCG 16200 

AGGTGGACGC OGACCTGTTG CCGGAGGAGG GTAGCCAGCT GCTCTGCCTG GGCGAGCGCT 16260 

GGTTGCTCGG CGGGTOGGGC GAGGCGCGCC TGGCCCTGCG TGGGGAGGAC TGGCCGGAGC 16320 

TGGCGGCGCT CTGTCCGCCG CCCCGGCAAG CCTATGTGCC GCCCGGGCAG GCGGCGCCGC 16380 

CGGGOGTCGA GGCCTGCGAG ACGCTGGAGC AGCCGTGGCT CTGGCTGGCC GCGCAGAAGT 16440 

COGGCTGCAA CCTGGCCCAG GGGCCTTTCG CCCGTCGCGA G CC TTCOGGC CAGTGGCAGC 16500 

GCTGGOGGCC GCTGGCGGGG CTGCTCGGTC TCTGGCTGGT GCTGCAKTGG GGCTTCAACC 16560 

TTGCCCANGG CTGGCAGCTG CAGCGCGAGG GTGAACGCTA TGCCGTGGCC AACGAGGCGC 16620 

TGTATOGCGA GCTGTTCCCC GAGGATCGCA AGGTGATCAA CCTGCGTGCG CAGTTCGACC 16680 

AGCACCTGGC CGAGGGGGCT GGGAGCGGCC AGAGCCAGTT GCTGGCCCTG CTCGATCAGG 16740 

COGCOGOGGC CATCGGCGAA GGGGGGGCGC AGGTGCAGGT GGATCAGCTC GACTTCAACG 16800 

CCCAGCGTGG C6ACCTGGCC TTCAACCTGC GTGCCAGCGA CTTCGCCGCG CTGGAAAGCC 16860 

TGCGGGCGCG CCTGCAGGAG GCCGGCCTGG CGGTGGACAT GGGCTCGGCG AGCCGCGAGG 16920 

ACAAOGGCGT CAGTGOGOGC CTGGTGATCG GGGGTAACGG ATGAACGGCC TGCTCATGCA 16980 

ATGGCAAGCG CGCCTGGCGC AGAACCCTTT GATGCTGOGC TGGCAGGGCC TGCCGCCACG 17040 

OGACCGGCTG GCCCTGGGCC TGCTCGCTGC CTTCCTGTTG CTGGTGCTGC TGTACCTGTT 17100 

GCTGTGGCGG COGGTCAGCC AGAACCTGGA GCGGGCGCGC GGCTTCCTGC AGCAGCAGCG 17160 

TACGCTGCAC GCCTACCTGC AGGAGCATGC ACCGCAGGTG CGGGCACGGC AGGTCGCACC 17220 

GCAGGCCAGT ATCGAGCCTG COGCGCTGCA GGGGTTGGTG ACCGCCAGTG COGCCAGCCA 17280 

GGGGCTGAAT GTCGAGCGTC TGGACAACCA GGGTGATGGT GGCCTGCAGG TGAGCCTGCA 17340 

GCCGGTCGAG TTCGCCOGTC TGCTGCAGTG GCTGGTGAGC CTGCAGGAGC AGGGCGTGCG 17400 

CGTCGAAGAG GCCGGTCTGG AACGTGCCGA CAAGGGGCTG GTGAGCAGCC GCCTGCTGCT 17460 

GCGTGCCGGT TGAGCCCGGC TGCACCAGGC GAGTGCGTCG GCACTCGCGC GGAGCATCTG 17520 

GAAAACCCGT CCGOGAAGAA AAATTCAAGC AGGGTGTTGA CTTAGCTATG ACCTCTNCGT 17580 

CAATTGCGCG CCTCGCANGC TAACGGCTGG AT 17612 



(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS : 
.(A) LENGTH: 2634 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNBSS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 

ATGGAAGATC GCAAGCOGCC TGCCGCGGCT CCCGTGGGGT TTGCGCGCGC GGAGCTGCTG 60 

GAGCTGCTCT GCCGCTGCGA GCAGTTTCCC CTGACCCTGC TGCTGGCGCC CGCCGGTTCC 120 

GGCAAGTCGA CCCTGCTGGC CCAGTGGCAG GCCAGCCGGC CCTTCGGCAG TGTGGTGGAC 180 

TATCCACTGC AGGCGCGTGA CAACGAGCCG GTACGCTTCT TCCGCCACCT GGCCGAAAGC 240 

ATCCGCGCCC AGGTCGAGGA CTTCGACCTG TCCTGGTTCA ACCCCTTCGC CGCCGAGATG 300 

CACCAGGCGC CCGAGGTGCT CGGCGAGTAC CTGGCCGACG CCCTCAATCG CATCGAGAGC 360 

CGCCTCTACC TCGTCCTCGA CGACTTCCAG TGCATCGGCC AGCCGATCAT CCTCGACGTG 420 

CTCTCGGCCA TGCTCGAACG CCTGGCGGGC AACACCCGGG TCATTCTGTC CGGGCGCAAC 480 

CATCCGGGGT TCTCCCTCAG CCGCCTGAAA CTGGACAACA AGCTGCTGTG CATCGACCAG 540 

CACGACATGC GCCTGTCGCC AGTGCAGATC CAACACCTCA ATGCCTACCT GGGCGGTCCC 600 

GAGCTCAGCC CGGCCTATGT CGGCAGCCTG ATGGCCATGA CCGAGGGCTG GATGGTCGGG 660 

GTGAAGATGG CCCTGATGGC CCATGCGCGC TTCGGCACCG AGGCCCTGCA GCGCTTCGGT 720 
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GGCGGCCATC CGGAGATAGT CGACTACTTC 
CAGCTGGACG ACTTCCTGTT GTGCAGCGCG 
GACOGGGTGC TGGATCGCAp CGGTTCGGCC 
CTGTTCATGC TGCCGGTGGA OGAGTATCCC 
GATTTCCTCG CCCGGCGCCT GGCOGTGCAC 
OGGGCGGCCC TGGCGCTGCA GCAGCGTGGC 
CGCAGTGGCG ACCGCGCGTT GTTCCAAAGC 
CGCAGCGGTC ACTTCGCCGA GGTGCTGAAG 
TGOGNGCAGT CGCGCCTGCT GGTGCTGATG 
CACCAGGCGC GCTACTGCTT GGACGAACTG 
GAGGAGCOGA CCCGCCAGCT GCTGGCGCTC 
TTCGACCCCG GCCAGOGCTG GTCCGACCTG 
GCCCTGGCGC TGAGCATCCT CGCCTATCAC 
ATCCAGCTGG OGCTGGAGGC GAAGGCGCTG 
AGCTACGCCG ACCTGATCAT CGCCCTGTGC 
CGCAAGGACG TCTGCCTGGA TTACCAGCGC 
CGTGCCACCG CCATGGTGGT GGCGCTGTAC 
CTGTGCGAGG ACCTGATGGC CATGGTCACG 
GTGCACATCA CCCTGTCGCG CCTGCTCCAC 
CTGCTGGAGC AGCTGTOGCG CATCCTGCAA 
GCGGCGCAGG AGAGCATGCG CCAGGCCTAT 
CTGGCCCAAC GCCTGGGTAT CGAGGAGCGC 
CCCTATGAAG AGTGCTGGGA ACGCTACGGC 
GGCGCCGAGC OGCGCGCCTG CCGCATCCTC 
GAGATGAAGG CCCGTGCGCT GGTGGTGGAG 
CTGGGGGCGG ACGAGCAGGA CAGGGCCCTG 
AACATCAACC GCTOGGTATT CGACGAGGCG 
CTGCGCTCGG GCCGGCTGCA GGCGCCGGAG 
CAGGGCACAG GCCAGGOGCC GCOGGOGCTC 
AAGGAGGOGG OGATCTTCGC CTGCCTGCTC 
AGCACCGGCA TCGCCCTGTC CACCACCAAG 
AGCCTCTCCG GGCGTACCGA AGCCATCCTC 



GGCCATGTGG TGCTGAAGAA GCTGTCGCCG 780 

ATCTTCGAGC GCTTCGACGG CGAGCTATGC 840 

CTGCTGCTGG AGGACCTGGC CGCGCGCGAG 900 

GGCTGCTACC GCTACCACGC CCTGTTGCAC 960 

AAGCCACAGG AAGTGGCGCA ACTGCACCGG 1020 

GACCTGGAGC TGGCCCTGGA GCATGCCCAG 1080 

ATGCTGGGCG AGGCCTGCGA GGAATGGGTG 1140 

TGGCTGGAGC CGCTGAGCGA GGCGGAACTC 1200 

ACCTATGCCC TGACCCTGTC GCGGCGTTTC 1260 

GTGGCGCGCT GCACCGGTCA GCOGGGCCTG 1320 

AACCTGGAGC TGTTCCAGCA CGACCTGGCC 1380 

CTGGCCGCGG GOGTOGCCTC GGACATCCGT 1440 

CACCTGATGC ACGGCCGCCT GGAGGAGTOG 1500 

CTGGCCAGGA CCGGCCAGCT GTTCCTGGAG 1560 

AACCGGAACG CCGGGCGCGC GACCAGCGCG 1620 

ACCGAGCGCT CCTCGCCGGC CTGGGTCAAC 1680 

GAGCAGAACC AGCTGGCCGC CGCCCAGCAG 1740 

TCGTCCTCGG CCACCGAGAC CATCGCCACC 1800 

CGGCGCCAGT CCCAGGGCCG CGCCACGCGC 1860 

CTGGGCAACT ACGCCCGCTT CGCCAGCCAG 1920 

CTCGACGGGC GCCCGGOGGC GCTCGACGCA 1980 

CTGGCCGCCG GGGAGTGGGA GAGGGTGCGG 2040 

CTGGCCGCCG TGTACTGGCT GGTGATGCGC 2100 

AAGGTGCTGG CGCAGGCGNT GNAGAACAGC 2160 

GCCAACCTGC TGGTGCTGAA CGCCCCGGAG 2220 

CTGGCGCTGG TCGAGCGCTT CGGCATCGTC 2280 

CCCGGCTTCG CCGAGGCGGT GTTCGGCCTG 2340 

GCCTATCGCG AGGCCTATGC CGACTTCCTC 2400 

CTGTCCGAGT CGCTGAAACA GCTTACCGAC 2460 

AGGGGGCTGT CCAACAGCGA GATCAGCGCC 2520 

TGGCACCTGA AGAACATCTA CTCGAAGCTG 2580 

GCGATGCAGG CCCGCAACGG ATAA 2634 



(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 877 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:31: 



Met 


Glu 


Asp 


Arg 


Lys 


Pro 


Pro 


Ala 


Ala 


Ala 


Pro 


Val 


Gly 


Phe 


Ala 


Arg 


1 








5 










10 










15 




Ala 


Glu 


Leu 


Leu 
20 


Glu 


Leu 


Leu 


Cys 


Arg 
25 


Cys 


Glu 


Gin 


Phe 


Pro 
30 


Leu 


Thr 


Leu 


Leu 


Leu 
35 


Ala 


Pro 


Ala 


Gly 


Ser 
40 


Gly 


Lys 


Ser 


Thr 


Leu 
45 


Leu 


Ala 


Gin 


Trp 


Gin 
50 


Ala 


Ser 


Arg 


Pro 


Phe 
55 


Gly 


Ser 


Val 


Val 


His 
60 


Tyr 


Pro 


Leu 


Gin 


Ala 


Arg 


Asp 


Asn 


Glu 


Pro 


Val 


Arg 


Phe 


Phe 


Arg 


His 


Leu 


Ala 


Glu 


Ser 


65 










70 










75 










80 


lie 


Arg 


Ala 


Gin 


Val 
85 


Glu 


Asp 


Phe 


Asp 


Leu 
90 


Ser 


Trp 


Phe 


Asn 


Pro 
95 


Phe 


Ala 


Ala 


Glu 


Met 
100 


His 


Gin 


Ala 


Pro 


Glu 
105 


Val 


Leu 


Gly 


Glu 


Tyr 
110 


Leu 


Ala 


Asp 


Ala 


Leu 


Asn 


Arg 


lie 


Glu 


Ser 


Arg 


Leu 


Tyr 


Leu 


Val 


Leu 


Asp Asp 
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115 










120 


Phe Gin 


Cys 


He 


Gly 


Gin 


Pro 


He 


130 










t ^ c 

135 




Leu Glu 


Arg 


Leu 


Ala 


Gly 


Asn 


Thr 


145 








150 






Hie Pro 


Gly 


Phe 


Ser 


Leu 


Ser 


Arg 








165 








Cys He 


Asp 


Gin 


His 


Asp 


Met 


Arg 






160 










Leu Asn 


Ala 


Tyr 


Leu 


Gly 


Gly 


Pro 




195 










200 


Ser Leu 


Met 


Ala 


Met 


Thr 


Glu 


Gly 


210 










215 




Leu Met 


Ala 


His 


Ala 


Arg 


Phe 


Gly 


225 








230 






Oly Cly 


His 


Pro 


Glu 


He 


Val 


Asp 








245 








Lye Leu 


Ser 


Pro 


Gin 


Leu 


His 


Asp 






260 










Glu Arg 


Phe 


Asp 


Gly 


Glu 


Leu 


cys 




275 










280 


Ser Ala 


Leu 


Leu 


Leu 


Glu 


Asp 


Leu 


290 










295 




Pro Val 


Asp 


Glu 


Tyr 


Pro 


Gly 


Cys 


305 








310 






Asp Phe 


Leu 


Ala 


Arg 


Arg 


Leu 


Ala 








325 








Gin Leu 


His 


Arg 


Arg 


Ala 


Ala 


Leu 






340 










Glu Leu 


Ala 


Leu 


Gin 


His 


Ala 


Gin 




355 










360 


Gin Ser 


Met 


Leu 


Gly 


Glu 


Ala 


Cys 


370 










375 




Phe Ala 


Glu 


Val 


Leu 


Lys 


Trp 


Leu 


385 








390 






Cys Xaa 


Gin 


Ser 


Arg 


Leu 


Leu 


Val 








405 








Ser Arg 


Arg 


Phe 


His 


Gin 


Ala 


Arg 






420 










Arg Cys 


Thr 


Gly 


Gin 


Pro 


Gly 


Leu 




435 










440 


Ala Leu 


Asn 


Leu 


Glu 


Leu 


Phe 


Gin 


450 










455 




Gin Arg 


Trp 


Ser 


Asp 


Leu 


Leu 


Ala 


465 








470 






Ala Leu 


Ala 


Leu 


Ser 


He 


Leu 


Ala 








485 








Leu Glu 


Gin 


Ser 


He 


Gin 


Leu 


Ala 






500 










Ser Thr 


Gly 


Gin 


Leu 


Phe 


Leu 


Glu 




515 










520 


Leu Cys 


Asn 


Arg 


Asn 


Ala 


Gly 


Arg 


530 










535 




Cys Leu 


Asp 


Tyr 


Gin 


Arg 


Thr 


Glu 


545 








550 






Arg Ala 


Thr 


Ala 


Met 


Val 


Val 


Ala 








565 








Ala Ala 


Gin 


Gin 


Leu 


Cys 


Glu 


Asp 



580 



50 



125 

He Leu Asp Val Leu Ser Ala Met 
140 



Arg Val 


He Leu Ser 


Gly 


Arg 


Asn 




155 






160 


Leu Lys 


Leu Asp Asn 


Lys 


Leu 


Leu 


170 






175 




Leu Ser 


Pro Val Gin 


He 


Gin 


His 


185 




190 




* 


Glu Leu 


Ser Pro Ala 


Tyr 


Val 


Gly 




205 








Trp Met 


Val Gly Val 


Lys 


Met 


Ala 




220 








Thr Glu 


Ala Leu Gin 


Arg 


Phe 


Gly 




235 






240 


Tyr Phe 


Gly His Val 


Val 


Leu 


Lys 


250 






255 




Phe Leu 


Leu Cys Ser 


Ala 


He 


Phe 


265 




270 






Asp Arg 


Val Leu Asp 


Arg 


Ser 


Gly 




285 








Ala Ala 


Arg Glu Leu 


Phe 


Met 


Leu 




300 








Tyr Arg 


Tyr His Ala 


Leu 


Leu 


His 




315 






320 


Val His 


Lys Pro Gin 


Glu 


Val 


Ala 


330 






335 




Ala Leu 


Gin Gin Arg 


Gly 


Asp 


Leu 


345 




350 






Arg Ser 


Gly Asp Arg 


Ala 


Leu 


Phe 




365 








Glu Gin 


Trp Val Arg 


Ser 


Gly 


His 




380 








Glu Pro 


Leu Ser Glu 


Ala 


Glu 


Leu 




395 






400 


Leu Met 


Thr Tyr Ala 


Leu 


Thr 


Leu 


410 






415 




Tyr Cys 


Leu Asp Glu 


Leu 


Val 


Ala 


425 




430 






Glu Glu 


Pro Thr Arg 


Gin 


Leu 


Leu 




445 








His Asp 


Leu Ala Phe 


Asp 


Pro 


Gly 




460 








Ala Gly 


Val Ala Ser 


Asp 


He 


Arg 




475 






480 


Tyr His 


His Leu Met 


His 


Gly 


Arg 


490 






495 




Leu Glu 


Ala Lys Ala 


Leu 


Leu 


Ala 


505 




510 






Ser Tyr 


Ala Asp Leu 


He 


He 


Ala 




525 








Ala Thr 


Ser Ala Arg 


Lys 


Asp 


Val 




540 








Arg Ser 


Ser Pro Ala 


Trp 


Val 


Asn 




555 






560 


Leu Tyr 


Glu Gin Asn 


Gin 


Leu 


Ala 


570 






575 




Leu Met 


Ala Met Val 


Thr 


Ser 


Ser 


585 




590 
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Ser Ala Thr Glu Thr lie Ala Thr Val His lie Thr Leu Ser Arg Leu 

595 600 605 

Leu His Arg Arg Gin Ser Gin Gly Arg Ala Thr Arg Leu Leu Glu Gin 

610 615 620 

Leu Ser Arg lie Leu Gin Leu Gly Asn Tyr Ala Arg Phe Ala Ser Gin 
625 630 635 640 

Ala Ala Gin Glu Ser Met Arg Gin Ala Tyr Leu Asp Gly Arg Pro Ala 

645 650 655 

Ala Leu Asp Ala Leu Ala Gin Arg Leu Gly He Glu Glu Arg Leu Ala 

660 665 670 

Ala Gly Glu Trp Glu Arg Val Arg Pro Tyr Glu Glu Cys Trp Glu Arg 

675 680 685 

Tyr Gly Leu Ala Ala Val Tyr Trp Leu Val Met Arg Gly Ala Gin Pro 

690 695 700 

Arg Ala Cys Arg He Leu Lye Val Leu Ala Gin Ala Xaa Xaa Asn Ser 
705 710 715 720 

Glu Met Lys Ala Arg Ala Leu Val Val Glu Ala Asn Leu Leu Val Leu 

725 730 735 

Asn Ala Pro Gin Leu Gly Ala Asp Glu Gin Asp Arg Ala Leu Leu Ala 

740 745 750 

Leu Val Glu Arg Phe Gly He Val Asn He Asn Arg Ser Val Phe Asp 

755 760 765 

Glu Ala Pro Gly Phe Ala Glu Ala Val Phe Gly Leu Leu Arg Ser Gly 

770 775 780 

Arg Leu Gin Ala Pro Glu Ala Tyr Arg Glu Ala Tyr Ala Asp Phe Leu 
785 790 795 800 

Gin Gly Thr Gly Gin Ala Pro Pro Ala Leu Leu Ser Glu Ser Leu Lys 

805 810 815 

Gin Leu Thr Asp Lys Glu Ala Ala He Phe Ala Cys Leu Leu Arg Gly 

820 825 830 

Leu Ser Asn Ser Glu He Ser Ala Ser Thr Gly He Ala Leu Ser Thr 

835 840 845 

Thr Lys Trp His Leu Lys Asn He Tyr Ser Lys Leu Ser Leu Ser Gly 

850 855 860 

Arg Thr Glu Ala He Leu Ala Met Gin Ala Arg Asn Gly 
865 870 875 



(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 513 base pairB 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 

ATGAACGGCC TGCTCATGCA ATGGCAAGCG CGCCTGGCGC AGAACCCTTT GATGCTGCGC 60 

TGGCAGGGCC TGCCGCCACG CGACCGGCTG GCCCTGGGCC TGCTCGCTGC CTTCCTGTTG 120 

CTGGTGCTGC TGTACCTGTT GCTGTGGCGG CCGGTCAGCC AGAACCTGGA GCGGGCGCGC 180 

GGCTTCCTGC AGCAGCAGCG TACGCTGCAC GCCTACCTGC AGGAGCATGC ACCGCAGGTG 240 

CGGGCACGGC AGGTCGCACC GCAGGCCAGT ATCGAGCCTG CCGCGCTGCA GGGGTTGGTG 300 

ACCGCCAGTG CCGCCAGCCA GGGGCTGAAT GTCGAGCGTC TGGACAACCA GGGTGATGGT 360 

GGCCTGCAGG TGAGCCTGCA GCCGGTCGAG TTCGCCCGTC TGCTGCAGTG GCTGGTGAGC 420 

CTGCAGGAGC AGGGCGTGCG CGTCGAAGAG GCCGGTCTGG AACGTGCCGA CAAGGGGCTG 480 

GTGAGCAGCC GCCTGCTGCT GCGTGCCGGT TGA 513 
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(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 170 amino acids 

(B) TOPE: amino acid 

<C) STRANDEDNESS : single 
(D) TOPOLOGY : linear 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 



Met Asn 


Gly 


Leu 


Leu 


new 


uxn 


Trp Gin 


Ala 


Arg Leu 


Ala 


Gin 


Asn 


Pro 


1 






5 
















15 




Leu Met 


Leu 


Arg 


Trp 


Gin 


Gxy 


Leu Pro 
25 


Pro 


Arg Asp 


Arg 


Leu 
30 


Ala 


Leu 


Gly Leu 


Leu 


Ala 


Ala 


Phe 


Leu 


Leu Leu 


Val 


Leu Leu 


Tyr Leu Leu Leu 


35 










40 






45 








Trp Arg 


Pro 


Val 


Ser 


Gin 


Asn 


Leu Glu 


Arg 


Ala Arg 


Gly 


Phe 


Leu 


Gin 


50 










55 






60 










Gin Gin 


Arg 


Thr 


Leu 


His 


Ala 


Tyr Leu 


Gin 


Glu His 


Ala 


Pro 


Gin 


Val 


65 








70 








75 








80 


Arg Ala 


Arg 


Gin 


Val 


Ala 


Pro 


Gin Ala 


Ser 


lie Glu 


Pro 


Ala 


Ala 


Leu 




85 








90 








95 




Gin Gly 


Leu 


Val 


Thr 


Ala 


Ser 


Ala Ala 


Ser 


Gin Gly 


Leu 


Asn 


Val 


Glu 




100 








105 








110 






Arg Leu 


Asp 


Asn 


Gin 


Gly 


Asp 


Gly. Gly 


Leu 


Gin Val 


Ser 


Leu 


Gin 


Pro 


115 










120 






125 








Val Glu 


Phe 


Ala 


Arg 


Leu 


Leu 


Gin Trp 


Leu 


Val Ser 


Leu 


Gin 


Glu 


Gin 


130 










135 






140 










Gly Val 


Arg 


Val 


Glu 


Glu 


Ala 


Gly Leu 


Glu 


Arg Ala 


Asp 


Lys 


Gly 


Leu 


145 






150 








155 








160 


Val Ser 


Ser 


Arg 


Leu 


Leu 


Leu 


Arg Ala 


Gly 













165 170 



(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1176 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 

GATCTCGAGG GCGTCGGCTT CGACACCCTG GCGGTGCGCG CCGGTCAGCA TCGCACGCCG 60 

GAGGGCGAGC ATGGCGAGGC CATGTTCCTC ACCTCCAGCT ATGTGTTCCG CAGCGCCGCC 120 

GACGCCGCCG CGCGCTTCGC CGGCGAGCAG CCGGGCAACG TCTACTCGCG CTACACCAAC 180 

CCGACCGTGC GCGCCTTCGA GGAGCGCATC GCCGCCCTGG AAGGCGCCGA GCAGGCGGTG 240 

GCCACCGCCT CCGGCATGGC CGCCATCCTG GCCATCGTCA TGAGCCTGTG CAGCGCCGGC 300 

GACCATGTGC TGGTGTCGCG CAGCGTGTTC GGCTCGACCA TCAGCCTGTT CGAGAAGTAC 360 

CTCAAGCGCT TCGGCATCGA GGTGGACTAC CCGCCGCTGG CCGATCTGGA CGCCTGGCAG 420 

GCAGCCTTCA AGCCCAACAC CAAGCTGCTG TTCGTCGAAT CGCCGTCCAA CCCGTTGGCC 480 

GAGCTGGTGG ACATAGGCGC CCTGGCCGAG ATCGCCCACG CCCGCGGCGC CCTGCTGGCG 540 

GTGGACAACT GCTTCTGCAC CCCGGCCCTG CAGCAGCCGC TGGCGCTGGG CGCCGATATG 600 

GTCATGCATT CGGCGACCAA GTTCATCGAT GGCCAGGGCC GCGGCCTGGG CGGCGTGGTG 660 

GCCGGGCGCC GTGCGCAGAT GGAGCAGGTG GTCGGCTTCC TGCGCACCGC CGGGCCGACC 720 

CTCAGCCCGT TCAACGCCTG GATGTTCCTC AAGGGCCTGG AGACCCTGCG TATCCGCATG 780 

CAGGCGCAGA GCGCCAGCGC CCTGGAACTG GCCCGCTGGT TGGAGACCCA GCCGGGCATC 840 

GACAGGGTCT ACTATGCCGG CCTGCCCAGC CACCCGCAGC ACGAGCTGGC CAAGCGGGAG 900 
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CAGAGTGCCT TCG6CGC6GT GCTGAGCTTC GAGGTCAAGG GCGGCAAGGA GGCGGCCTGG 960 

C G TTTCATCG ATGCCACCCG GGTGATCTCC ATCACCACCA ACCTGGGCGA TACCAAGACC 1020 

ACCATCGCCC ATCCGGCGAC CACCTCCCAC GGTCGTCTGT CGCCGCAGGA GCGCGCCAGC 1080 

CCCGGTATCC GCGACAACCT GGTGCGTGTC GCCGTGGGCC TGGAAGACGT GGTCGACCTC 1140 

AAGGCCGACC TGGGCCGTGG CCTGGCCGCG CTCTGA 1176 



(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 392 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 



TVT 


ASP 


Leu 


Glu 


Gly 


Val 


Gly 


Phe 


Asp 


Thr Leu Ala Val Arg Ala Gly 


1 








5 










10 15 


Gin 


His 


Arg 


Thr 


Pro 


Glu 


Gly 


Glu 


His 


Gly Glu Ala Met Phe Leu Thr 








20 










25 


30 


Ser 


Ser 


Tvr 


Val 


Phe 


Aro 


Ser 


Ala 


Ala 


Asp Ala Ala Ala Arg Phe Ala 






35 










40 




45 


Glv 


Glu 


Gin 


Pro 


Glv 


Asn 


Val 


Tyr 


Ser 


Arg Tyr Thr Asn Pro Thr Val 




50 










55 






60 




Ala 


Phe 


Glu 


Glu 


Ara 


He 


Ala 


Ala 


Leu Glu Gly Ala Glu Gin Ala 


65 










70 








75 80 


Val 


Ala 


Thr 


Ala 


ser 


Gly 


Met 


Ala 


Ala 


He Leu Ala He Val Met Ser 










85 










90 95 


Leu 


Cvs 


Ser 


Ala 


Gly 


Asp 

Mr 


His 


Val 


Leu 


Val Ser Arg Ser Val Phe Gly 








100 










105 


110 


Ser 


Thr 


lie 


Ser 


Leu 


Phe 


Glu 


Lys 


Tyr 


Leu Lys Arg Phe Gly He Glu 






115 










120 




125 


Val 


Aap 


Tyr 


Pro 


Pro 


Leu 


Ala 


Asp 


Leu 


Asp Ala Trp Gin Ala Ala Phe 




130 










135 






140 


Lys 


Pro 


Asn 


Thr 


Lys 


Leu 


Leu 


Phe 


Val 


Glu Ser Pro Ser Asn Pro Leu 


145 










150 








155 160 


Ala 


Glu 


Leu 


Val 


Asp 


He 


Gly 


Ala 


Leu 


Ala Glu He Ala His Ala Arg 










165 










170 175 


Gly 


Ala 


Leu 


Leu 


Ala 


Val 


Asp 


Asn Cys 


Phe Cys Thr Pro Ala Leu Gin 








180 










185 


190 


Gin 


Pro 


Leu 


Ala 


Leu 


Gly 


Ala 


Asp Met 


Val Met His Ser Ala Thr Lys 






195 










200 




205 


Phe 


lie 


Asp 


Gly 


Gin 


Gly 


Arg 


Gly Leu 


Gly Gly Val Val Ala Gly Arg 




210 










215 






220 


Arg 


Ala 


Gin 


Met 


Glu 


Gin 


Val 


Val 


Gly 


Phe Leu Arg Thr Ala Gly Pro 


225 










230 








235 240 


Thr 


Leu 


Ser 


Pro 


Phe 


Asn 


Ala 


Trp Met 


Phe Leu Lys Gly Leu Glu Thr 










245 










250 255 


Leu 


Arg 


lie 


Arg 


Met 


Gin 


Ala 


Gin 


Ser 


Ala Ser Ala Leu Glu Leu Ala 








260 










265 


270 


Arg 


Trp 


Leu 


Glu 


Thr 


Gin 


Pro 


Gly 


He 


Asp Arg Val Tyr Tyr Ala Gly 






275 










280 




285 


Leu 


Pro 


Ser 


His 


Pro 


Gin 


His 


Glu 


Leu 


Ala Lys Arg Gin Gin Ser Ala 




290 










295 






300 


Phe 


Gly 


Ala 


Val 


Leu 


Ser 


Phe 


Glu 


Val 


Lys Gly Gly Lys Glu Ala Ala 


305 










310 








315 320 


Trp 


Arg 


Phe 


He 


Asp 


Ala 


Thr 


Arg Val 


He Ser He Thr Thr Asn Leu 










325 










330 335 
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Gly Asp Thr Lyo Thr Thr lie Ala Hia Pro Ala Thr Thr Ser His Gly 

340 345 350 

Arg Leu Ser Pro Gin Glu Arg Ala Ser Ala Gly lie Arg Asp Asn Leu 

355 360 365 

Val Arg Val Ala Val Gly Leu Glu Asp Val Val Asp Leu Lys Ala Asp 

370 375 380 

Leu Ala Arg Gly Leu Ala Ala Leu 
385 390 



(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 847 base pairs 
<B) TYPE: nucleic acid 
<C) STRANDEDNE5S: single 
(D) TOPOLOGY: linear 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

ATGCTGAAAA AGCTGTTCAA GTCGTTTCGT TCACCTCTCA AGCGCCAAGC ACGCCCCCGC 60 

AGCACGCCGG AAGTTCTCGG CCCGCGCCAG CATTCCCTGC AACGCAGCCA GTTCAGCCGC 120 

AATGCGGTAA ACGTGGTGGA GCGCCTGCAG AACGCCGGCT ACCAGGCCTA TCTGGTCGGC 180 

GGCTGCGTAC GCGACCTGCT GATCGGCGTG CAGCCCAAGG ACTTCGACGT GGCCACCAGC 240 

GCCACCCCCG AGCAGGTGCG GGCCGAGTTT CGCAACGCCC GGGTGATCGG CCGCCGCTTC 300 

AAGCTGGCGC ATGTGCATTT CGGCCGCGAG ATCATCGAGG TGGCGACCTT CCACAGCAAC 360 

CACCCGCAGG GCGACGACGA GGAAGACAGC CACCAGTCGG CCCGTAACGA GAGCGGGCGC 420 

ATCCTGCGCG ACAACGTCTA CGGCAGTCAG GAGAGCGA1 JCCAC^GCCG CGACTTCACC 480 

ATCAACGCCC TGTACTTCGA CGTCAGCGGC GAGCGCGTGC TGGAC 1 ATGC CCACGGCGTG 540 

CACGACATCC GCAACCGCCT GATCCGCCTG ATCGGCGACC CCGAGCAGCG CTACCTGGAA 600 

GACCCGGTAC GCATGCTGCG CGCCGTACGC TTCGCCGCCA AGCTGGACTT CGACATCGAG 660 

AAACACAGCG CCGCGCCGAT CCGCCCCCTG GCGCCGATGC TGCGCGACAT CCCTGCCGCG 720 

CGCCrGTTCG ACGAGGTGCT CAAGCTGTTC CTCGCCGGCT ACGCCGAGCG CACCTTCGAA 780 

CTG ~GCTCG AGTACGACCT GTTCGCCCCG CTGTTCCCGG CCAGCGCCCG CGCCCTGGAG 840 

CGCGATC 847 



(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 282 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 



Met 


Leu 


Lys 


Lys 


Leu 


Phe 


Lys 


Ser 


Phe 


Arg 


Ser 


Pro Leu Lys Arg Gin 


1 






5 










10 




15 




Ala 


Arg 


Pro 


Arg 
20 


Ser 


Thr 


Pro 


Glu 


Val 
25 


Leu 


Gly 


Pro Arg Gin His 
30 


Ser 


Leu 


Gin 


Arg 
35 


Ser 


Gin 


Phe 


Ser 


Arg 
40 


Asn 


Ala 


Val 


Asn Val Val Glu 
45 


Arg 


Leu 


Gin 
50 


Asn 


Ala 


Gly 


Tyr 


Gin 
55 


Ala 


Tyr 


Leu 


Val 


Gly Gly Cys Val 
60 


Arg 


Asp 


Leu 


Leu 


He 


Gly 


Val 


Gin 


Pro 


Lys 


Asp 


Phe 


Asp Val Ala Thr 


Ser 


65 










70 










75 




80 


Ala 


Thr 


Pro 


Glu 


Gin 
85 


Val 


Arg 


Ala 


Glu 


Phe 
90 


Arg 


Asn Ala Arg Val 
95 


He 
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Gly Arg Arg Phe Lys Leu Ala His Val His Phe Gly Arg Glu He He 

100 105 110 

Glu Val Ala Thr Phe His Ser Asn His Pro Gin Gly Asp Asp Glu Glu 

115 120 125 

Asp Ser His Gin Ser Ala Arg Asn Glu Ser Gly Arg He Leu Arg Asp 

130 135 140 

Asn Val Tyr Gly Ser Gin Glu Ser Asp Ala Gin Arg Arg Asp Phe Thr 
145 150 155 160 

He Asn Ala Leu Tyr Phe Asp Val Ser Gly Glu Arg Val Leu Asp Tyr 

165 170 175 

Ala His Gly Val His Asp He Arg Asn Arg Leu He Arg Leu He Gly 

180 185 190 

Asp Pro Glu Gin Arg Tyr Leu Glu Asp Pro Val Arg Met Leu Arg Ala 

195 200 205 

Val Arg Phe Ala Ala Lys Leu Asp Phe Asp He Glu Lys His Ser Ala 

210 215 220 

Ala Pro He Arg Arg Leu Ala Pro Met Leu Arg Asp He Pro Ala Ala 
225 230 235 240 

Arg Leu Phe Asp Glu Val Leu Lys Leu Phe Leu Ala Gly Tyr Ala Glu 

245 250 255 

Arg Thr Phe Glu Leu Leu Leu Glu Tyr Asp Leu Phe Ala Pro Leu Phe 

260 265 270 

Pro Ala Ser Ala Arg Ala Leu Glu Arg Asp 
275 280 
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56 

What is Claimed: 

1. An isolated nucleic acid encoding a kinase from a Pseudomonad that can 
regulate the expression of a lipase. 

2. The nucleic acid of Claim 1, wherein the kinase is LipQ. 

3. The nucleic acid of Claim 1, having the sequence as shown in 
SEQID N0:1. 

4. A purified kinase encoded by a nucleic acid of Claims 1-3. 

5. An isolated nucleic acid encoding a DNA binding regulator from a 
Pseudomonad that can regulate the expression of a lipase. 

6. The nucleic add of Claim 6, wherein the DNA binding regulator is LipR. 

7. The nudeic add of Claim 6 having the DNA sequence as shown in SEQ 
ID NO:3 

8. A purified DNA binding regulator encoded by the nudeic add of Claims 5- 

7. 



9. An isolated nudeic add encoding a Rseudomonas alcaligenes upstream 
activating sequence having the DNA sequence as shown in SEQ ID NO:5. 

10. An isolated nucleic add encoding a Pseudomonas alcaligenes sigma 54 
promoter that can regulate expression of a lipase. 

11. A purified Pseudomonas alcaligenes sigma 54 promoter that can regulate 

expression of a lipase. 

12. An isolated nudeic add encoding a Pseudomonas alcaligenes secretion 
factor sel cted from the group consisting of XcpP t XcpQ, Orf V, OrfX, XcpR, XcpS, 
XcpT. XcpU, XcpV, XcpW, XcpX, XcpY, XcpZ and OrfY. 
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13. The nucleic acid of Claim 12 wherein said nucleic acid has a sequence 
selected from the group consisting of SEQ ID NO: 12, SEQ ID NO: 14, SEQ ID NO: 30, 
SEQ ID NO: 16, SEQ ID NO: 6, 

SEQ ID NO: 8, SEQ ID NO: 10, SEQ ID NO: 18, SEQ ID NO: 20, SEQ ID NO: 22, SEQ 
ID NO: 24, SEQ ID NO: 26, SEQ ID NO: 32 and SEQ ID NO: 34. 

14. An expression vector comprising nucleic acids encoding a kinase, a DNA 
binding regulator, a promoter and an upstream activating sequence. 

15. The expression vector of Claim 14, wherein the kinase is LipQ, the DNA 
binding regulator is LipR, the promoter is a sigma 54 promoter from a Pseudomonad, 
and the upstream activating sequence is UAS. 

16. The expression vector of Claim 14, wherein the nucleic acid encoding the 
kinase has the sequence shown in SEQ ID NO:1. 

17. The expression vector of Claim 14, wherein the nucleic acid encoding the 
DNA binding regulator has the sequence shown in SEQ ID NO:3. 

18. The expression vector of Claim 14, wherein the nucleic acid encoding the 
upstream activating sequence has the sequence shown in SEQ ID NO:5. 

19. The expression vector of Claim 14, further comprising a secretion factor. 

20. The expression vector of Claim 19, wherein the secretion factor is 
selected from the group consisting of XcpP, XcpQ, Orf V, OrfX, XcpR, XcpS, XcpT, 
XcpU, XcpV, XcpW, XcpX, XcpY, XcpZ and OrfY. 

21 . A plasmid containing the expression vector of Claim 14. 



22. A method of transforming a host cell comprising adding a plasmid 
containing the expression vector of Claim 14 to host cells under appropriate conditions. 
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23. The method of Claim 22, wherein the h st cells are bacteria. 

24. A method of transforming a host cell comprising adding a plasmid 
containing the expression vector of Claim 19 to host cells under appropriate conditions. 

25. The method of Claim 24, wherein the host cells are bacteria. 

26. A transformed host cell containing the expression vector of Claim 14. 

27. The transformed host cell of Claim 26, wherein the host cell is a bacteria. 

28. The transformed host cell of Claim 27, wherein the bacteria is a 
Pseudomonad. 

29. The expression vector of Claim 14 further comprising nucleic acid 
encoding a protein. 

30. The expression vector of Claim 29, wherein the protein is an enzyme. 

31 . The expression vector of Claim 30, wherein the enzyme is a lipase. 

32. An isolated nudeic acid encoding a Pseudomonas alcaligenes tux-box 
binding element 

33. An isolated nucleic acid encoding an orfV-box binding element. 
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ATCGGCGT A7G7 7C TCGCCAAGGACCAGGAAGTCC T GATGTGGAACCGCGCC ATGCAGGAACTCACCGGCATC 
. — ' 75 

net Civ Vo! Cys 5er Leu Aio Lvs Aso Gin Glu Vol Leu he* Trp Asn Aro Alo net Glu Clu Leu Thr Gly Me 
I bpO : 

AGCCCGC AGCAGGTGGTCGGCTCCCCCCTCC TC ACCCTGGAGCACCCCTCCCCCGAGCTCCTGCAGGACTTCATC 
, . 1 ,5 0 

Ser Ato Gtn Gin Vol Vol Gly Ser Aro Leu Leu Ser Leu Ou His Pro Tro Atq Gkt Leu Leu Gin Asp Phe Me 
IjpO 

ccccagga:gaggag:acctgcacaagcag:acctgcaactggaccgcsaggtgcgctggctcaacctgcacaag 

_ - 225 

a:s Gin A£c Cj G_ t*>s l©u H(< LvS O" Le-j Gin Leu Aso Glv Gi- Vol Aro Tro Leu Asn Leu»Hrs . Lys 

— — lipo : 

CC31;IC-^-3-CG--C r^C^GjCSIljjCl --CAGCGGCCT GGT GCT^CTGCTC GAGGAC GT CACCGAGACCCGC 
300 

Aio A'o I e Aso G _ ~rc iej A != p " C\ s e r Gly Leu Vol Leu te-* Vo! Glu Aso Vol Thr Gl u Thr Aro. 
— lipQ 

CTGCTGGAAGi:e-G: t 33:gca::::gag:g7c t ggccagcatccgccc:c7ggccg::ggggtcgcccacgag 

_ 375 

vol tea G.n Asc G- Vo' H-s S*r Cu a-q Leu Alo Ser He Civ A-c lc- Aio Aio Gly Vol Alo His Glu 
: fcpQ : 

a:cgg:--:ccgg-:a:c^::atc3::t::::-gcgcacaa:c:gcg:gaigag:gcgagggcgacgaggacctc 

~ 050 

: e G<» ~f *~c vc o . i.-:- - z k:o Gin Asn lc- Aro G - G j A"a Gij G: v Aso Glu Qu Leu 
hpO : : . 

ggccagatcagcaaccagatcctcgaccagaccaagcccatctcgcccatc^tccactccctgatcaacttcccc 
. 525 

Gly Glu lie Ser Asn Gin lie Leu Aso Cir. Thr Lys Arg lie Ser Aro lie Vol Gin Ser Leu Met Asn Phe Ato 
hpQ : 



CACGCCCGCCAGCAGC AGCCCGCCGAAT ACCCGGTCAGCCTCGCCGAAGTGGCGCACGACGCCATCGGCCTGCTG 
l. 600 

His Aio Gly Gin Gtn Glr. Aro Alo Glu Tyr Pro Vol Ser Leu Ala Glu Vol Alo Gin Aso Alo He Gly Leu Leu 
: hpO 



TCGCTGAACCCCCATGCC ACCGAAGTGC AGTTCT AC AACCTGTCCGATCCCGAGCACCTGGCCAACGGCGACCCG 
— i , 675 

Ser Leu Asn Aro His Glv Thr Glu Vol G»n Phe Tyr Asn Leu Cys Aso Pro Glu His Leu Ato Lys Gly Aso Pro 
: lipQ 



FIGURE 1A 
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CAGCCCCTGaCCCAGCTGCTGATCAACCTGZTaTCCAACCCCCGCGATCCCTCCCCCGCCGGCGGTCCCATCCGC 
- ■ 1 750 

Clr» Aro Leu Alo Gin Vol Leu Me Asa Leu Leu Ser Asn Alo Aro Asp Alo Ser Pro Ata G\y Gty Ato He Arg 
: lipQ 



CTCCGTAGCCACCCCGACGAGCACAGCGTGGTGCTGATCCTCCAGCACGACGCCACGCGCATTCCCCAGCCGATC 
_____ . 1 625 

Vbl Aro Se~ Ck* Ak> Gu Qu Gin Ser Vot Vc' Leu ite Vol Qu Aso Glu Gty Thr Cly Me Pro On Alo He 
: ■ ■ ttpQ ■ 



ATGCACCGCCTGTTCGAACCCTTCTTC ACC ACC AAGGACCCCGGCAAGGGC ACCGGTTTGGGGCTCGCGCTCCTC 
■ ' 900 

Met Aso Aro Leu Pr« Glu Pro Phe Pr-e Trr — LvS Asd Pro Cly Lys C!v Thr Cly Leu Gty Leu Wo Leu Vol 
: IrpQ : 



TATTCGATCGTGGAACAGCiTTATsaGCAGATrACC^TCGACACCCCCGCCStTCCCCAGCACCAaCCCGCAACC 
— 975 

Tyr Se- lie Vol Co Du M.s Tyr G'y G ; -~» We m- lie Aso Ser Pro Alo Aso Pro Gk> His Gin Arg Gly Thr 
. lipO — 



CGTT TC CCCGTGAt CCTGCCGCGCT ATCT CGA-GCGACGTCCACACCCACC TGA 
■» »C29 

Aro Fne A~c vdJ Trw- Leu Pre Aro Tyr Vo. G~ A d Tnr Ser Thr Ato Tn- 

— : — iipO 1 



FIGURE IB 
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ATGCCCCAT ATCC TC ATCCTCGAaGaCCAAAC C ATCA7CCGC7 CCCCCCT CCGCCGCCTCCTGGAACGCAACCAG 
• 75 

ttet Pro Hts He Leu He Vot Ou Aso Dj 7hr He lie Aro Ser Ato Leu Arg Atq Leu Leo Du Aro Asn Gk> 
I . — HpR — : ^ 

TACCACGTCAGCGAGGCCGG7TCGCTTCAGGAGGCCCAGCACCGCTACACCATTCCGACCTTCGACCTCGTCCTC 

■ ' " — — 1 » 150 

Tyr Gin Vot Ser Clu Alo Gly Ser Voi Cln Go Ato Gin Glu Arg Tyr Ser Me Pro Thr Phe Aso Leu Vol Vbl 
. bpR 

ACCGACC TCCGCCTGCCCGGCCCCC CCG3C ACCCAGCTGATCAAGCTGGCCGACGGC ACCCCCGTACTGATCATG 
225 

Ser Aso Leu Aro Lej Pro Glv Aio P-c G : „ 7hr GJu Leu he Lys Leu Alo Aso C*v 7hr Pro Vol Leu He Met 

: itpR : 



accagc7a;gccag:ctgcgc7cg::3:tgo-c*;gatgaaga7cgg:gccgtgga:;aca7cgccaagccct7C 

— • 300 



Th~ Ser 7vr Aio Ser L ej A-o Se* t- z vo> Asc 5e- net Lys net Gly Alo Vot Asc 7 v r he Aio Lys Pre Pne 
: hpR ; 



ga7caccacgaga:g:tccagcccgtggcg:g:atcctccgcga7ca:cagcaccc:aagcgcaacccgccaagc 

375 



Aso Mis Aso G:j rtet Lej Gin Aio Vd Ale Ar Q he Leu Arg Aso h s Gin Ck. Ao lvs Aro Asn Pro Pro Ser 

: — hpR : 



caggcgcccagcaag:ccg:cggcaagggcaacggcgccac:gcccagggcgaga:cgg:a7ca7ccgc7cctgc 

. i,5 0 

G.~» Alo Pro Sc lvs Ser Alo G > - w „s Av* G>v Alo 7nr Alo G<j Cv C-o l-o G^ He l*e GV Ser Cys 

hpR 

GCCGCCATGCAGCACC7 7 7ACGGCAAGA7CCGCAAGG7CGC7CCCACCCATTCCAACGTAC7GA7CCACGGCCAG 

— — — 525 



Ato Alo net Gin Aso Leu Tyr Gly Lys he Arg Lys Vol Ato Pro Thr Aso Ser Asn Vol Leu he Gin Gly Glu 
HpR 

TCCGGCACCGGCAAGGAGCTGGTCGCGCGTGCCCTCCACAACCTCTCCCGTCGCCCC AAGGCACCGCTCATCTCC 
■ i 600 

Ser Gly. Thr Gly Lys Gki Leu Vol Alo Aro Ala Leu His Asn Leu Ser Aro Aro Ato Lys Alo Pro Leu lie Ser 

" hpR 

G7CAACTGCCCGGCCATCCCCGAGACCC7CATCGAG7CCGAACTGTTCCCCCACGAGAAAGC7CCCTTCACCCGC 
i _ . ^ 675 

Vol Asn Cys Alo Ato He Pro Ou Thr Leu lie Glu Ser Glu Leu Phe G»v His Clo Lys Gly Alo Phe Thr Glv 
IrpR — L 



FIGURE 2A 
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GCCAGCGCCCGCCGCGCCGGCC7GCTCGAAGCGGCCGACGCCGCCACCCTCTTCC7CGACGAGATCCGCGAGCTG 
■ ■ ■ 75C 

Alo 5er £>3 G*y Arg Ato Gly Leu Vol Glu Alo Alo Aso Gly Gly Thr Leu Phe Leu Aso Ctu Me Gly Glu Leu 
hpR 

CCCCTCGAGGCCCAGGCCCGCCTGCTGCCCGTCCTCCAGGAGGGCGAGATCCCTCCGCTCGCCTCCGTGCACTCA 
— ■> 82$ 

Pro Leu Ou Alo Gtn Alo Arg Leu Leu Arg Vol Leu Gtn Glu Gty Qu lie Arg Arg Vol &y Ser Vol Gin Ser 
fepR 

CACAACGTCGATGTACGCCTGATCCCCCCTACCCACCGCCACCTCAAGACGCTCCCCAAGACCCGCCAGTTCCCC 
_ l. 900 

Gin Lys vol Aso Vol Aro Leo He Aio Alo Thr Mis Ara Aso Leu Lvs Ihr Leu Alo Lys Thr Gly Gin Phe Aro 
: lipR . ^ 

CAGCAC:*::ACTACCGCCTGCACG7CATCa::CTCAAGCTGCCGCCACTGCGCCAGCCCCGCGCCGACGTCAAC 
; 975 

Qu Asp Le- Tyr Aro Leu H»s Vo» He A»a Leu Lvs Leu Pro r>o Leu Arg Glu Arg Gly Alo Asp Vol Asn 
: (ipR 

gagatcg:::gcgccttcctcgtccgccagtg:cagcgca7ggcccgcgaggacctgcccttcgctcaggatgcc 

— — 1050 

On tie Aid Aro Alo Phe Leu Vol Aro Gin Cvs Gin Aro flet Gty Aro Glu Aso Leu Arg Phe Aio Gin Asd A<o 

: lipR : 

GiGCAGGCSATCCGCCACTACCCCTGGCCGGGCAACGTGCGCGAGCTGGAGAATGCCATCGAGCGCGCGGTGATC 
— — 1 125 

•»tr. A o ie Aro K»$ Tyr Pro * r c Pre C* Asn Vof Arg Glu Le j Qu Asn Aio He GKj Aro Alo Vol He 

: i^r 

ctcigcgagggccccgaaatttccgccgagctgctgggcatccacatccagctggacgacctggaggacggccac 

. 1200 

Leu Cys Glu Gly Ala Glu He Ser Alo Glu Leu Leu Gly He Aso He Glu Leu Aso Asp Leu Glu Asp Gly Asp 
lipR 



TTCCCCGAACAGCCACACCAGACCCCCGCCAACCACGAACCGACCGAGGACCTGTCGCTCGAGGACTACTTCCAG 
. . . ■ 1275 

Phe Gly GKi Gin Pro Gtn Gin Thr Aio Alo Asn His Chi Pro Thr Glu Asp Leu Ser Leu ©u Asp Tyr Phe Gin 
■ fcpR 

CACTTCGT ACTGCACC ACC AGGATC AC ATGACCGAGACCGAACTCGCGCGCAAGCTCGGCATCAGCCGCAAGTGC 
1350 

Mrs Phe Vol Leu Ou His Gin Asp His Met Thr Glu Thr Glu Leu Alo Arg Lys Leu Gly He Ser Arg Lys Cys 
, Irpfl — 

CTGTGGGAGCGtCGTCAGCGCCTGGCCATTCCGCGGCCCAAGTCCCCCGCCGCCACCGCCTCCTCA 

— — . 1 |ine 

Leu Trp Glu Arg Arg Gin Arg Leu Glv lie Pro Ara Ara Lys Ser Gly Alo Ala Thr Gly Ser • FIGURE 2B 

—————— —————— HpR 1 1 
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cATCTcsASG Sca:c^CT:csACACccTGCc;a:GC^ca::^TCAGCArcccAC-;::3GAGSGCCAGC ^ 

CTACAGCTCCCGCAGCCGAAGCTGTGGGACCGCCACGCGCGGCCAGTCGTAGCGTGCGG:::::CGCrCG 
I U U " ■ — Of* 

atggcgag gc:at:7tcctcacctccagctatgtgttccgcagcgccgccgacgccgccgcgcgcttcgc ^ 

TACCCCTCCGGIACAAGGAGTGGAGGTCGATACACAAGGuGTCGCGGCGGCTGCGGCGGCGCGCGAAGCG 
MCEAM^LrsstvrasAAD A A A 3 T A 

CGG CCACCAGC:GGGCAACGTCTACTCGCGCTACACCAACCCGACCG7GCGCGCCT7:GAGGAGCGCATC 
GCCGCTCGTCGGCCCGTTGCACATCACCGCGATCTGGTTGGGCTGGCACGCCCGGAAGCTI^TCGCGTAG 

geo?gnvys» w *no-v^a-e:e 5 ! 

Orf* 



GCCCC c::GG^AGGCGccG J :GCAGGCGGTGG::Acc3:::::^GCArGGc:i:cA:::-:Gc:A::GTCA ^ 

CGGCCGGAC-T'l'GCGGCTCGTCCCCCACCGGTGGCGGAGGCCGTACCGGIGGTi jGACZ jS'AGCAGT 

AA'_iiAtO AVi : ± i z * * a : . i ; * 



tcagcctgtgcagcg:;ggcgaccatgtgct jG*G7csca:-i:G'GT7:GGC7C3i::-":-Gcc7GT . 

ACTCGGACACG'C3C3GCISC7GG7ACACG-C:ACiGCGCZ7;3:ACiAGCCGA3;"3i*-3 . jG-CAA 

hsl:5ag3mvl'5^5 v ~ 3 5 ' : s . - 

***** • 

CGACAAG7ACC"I-AGCGC r 7CGGCA7CGAGG7GGACTAC:CG3ZG;7GGCCG-"C" j^i.--- 
GCTCT7CATGGAG:7;GCGAAGCCGTAGC7::iC:TGA:G:GCGGCGAC:GGC"-S-::"3:33-:;Gr: 



GCACCC77CAAGCCCiACACCAAGCrGC7G77:G7CGAA7CGCCGTCCAACCCG7733;:3-CC73GTG3 

i ' **90 

CGTCGGAAGT7:GGG7TGTGGTTCGACCACAAGCAGCTTAGCGGCAGGT7GGCCAACCGGC::GACCACC 

A A r '< P H TKLL C »ES»SMPLAEL» 



ACATAGGCGC:C7GGCCGAGA7CGCCCACCCCCGCGCCGCCC7GCTCGC GGTGGACAaC7GC7TC7GCAC 

, , , i i i ... . ■ ■ ■ . 560 

TGTATCCGCGGGACCGCC7CTAGCGGCTGCGGGCCCCGCGGGACCACCCCCACCTG7TCACGAAGACG7G 

DIGALAEIAHAR G A L L A V D H Z ' C T 

CCCGGCCC7GCAGCAGCCCCTGGCGCTGGGCGGCCATATGG7CA7GCAT7CCGCGACCAAGT7CArCGA7 

i ■ ■ ■ ■ ■ ■ ■ ■ 530 

ccgcccggacg:cgtcggcgaccgcgacccgcggctataccag7acgtaagccgctggt7:aagtagcta 

PAL30PLALGAD«V«H S a t < - i d 



WO 98/06836 PCT/US97/14450 

i/u 

& 

CCCCAGGCCCCCGCCCTGGGCGCCGTGGTC GGCCGCCGTGCGCACATCGAGCAGG TCGTCGGCTTCC 

i l , i i i * ~ - * . - . i ■ » -n— in. ■ i i i 70o 

CCCCTCCCGGCGCCGGACCC^CCGCACCACCCGCCCGCGGCACGCGTCrACCTCGTCCACCAGCCGAAGC 

COGRGLGCVVAGRRAOMEQVVGr 

OrfV 



TCCGCACCGCCGGGCCGACCCTCAGCCCGTTCAACCCCTGGATGTTCCTCAAGGGCCTGGAGACCCTGCG 

, | , i i ' ■ t ■ • ■ 1 1 ' -* 770 

ACCCCTGGCCGCCCCGCTGGGAGTCCGGCAAGTTGCGCACCTACAAGGAGTrcCCCGACCTCTGGCACGC 

LRTAGPTLSPPNAWnFLKGLETLS 

~* _ 

TATCCCCATGCAGGCGCAGAGCCCCAGCGCCCTGCAACTGGCCCGCTGGTTGGAGACCCACCCGGGCATC 

t , i t t r 1 8*10 

ATAGGCGTACGTCCCCGTCTCGCGGTCGCCGGACCTTGACCGGCCGACCAACCTCTGCGTCGGCCCGTAG 

IRM0A05A5ALELARWLETQP G : 

■ ' 



GACACGGTCrACTATGCXGSZCTGCCCAGCCACCCGCAGCACG^GCTGGCCAAGCGCCAGCAGAGrGCCr ^ 
CTGTCCCAGATGATACGGCCGGACGGGrCGGTGGGCGTCGT^CTCGACCGGTTC-CCGTCaTC 7CACGGA 

ORV**A-GLPSHPOr-e-- A K * 2 0 

J. — or* 



TCGGCCCGGTGCyGAGCT7CSAGGTCAACGGCGGCAAGGAGGCGGCCTGGCGTT7CATC3ATGC:A:c:G ^ 
AGCCGCGCCACGACrCGAAGCTCCAGTTCCCGCCGTTCCrcCCCCGGACCGCAAAGrA3::ACGG:aGG: 

F G A V ' 3 ~ z. VKGGK £ A A W R r [ r> a r ^ 

"~ — 

ggtcatctccatcaccaccaacctgggcgataccaagaccaccat:gccca:ccggcgacca::'::ca: 

CCACTAGACG7AG »GG7GG"GCACCCGC 7ATGGT7CTGGTGG7AGIGGGTA33CC ZZ 'ZZ' ZZ~ Z Z -» 

vis:ttnlgotktt;ah°at - ; •* 

■'■ — " ■■ ■ ■ — — — — — — 



CCTCGTCTGrCS::GCAGGAGCGCCCCACCGCCGGTATCCGCGACAACCTGGTGCG"r:3::z:-3G:; 

, , ( - - ■ ' ■ 1 ■ - — * * 1 120 

CCAGCAGACAGCGGCGTCt T^GCGCGGTCGCGGCCATAGGCGCTGTTGGACCACGCAC AGCGGC ACCCGG 

GRLSPO £ R A S A G I RONLVRVA v Z 

-- rw> * — .I. — ... 



TGGAAGACG'GGTCGACCTCAAGGCCGACCTGCCCCGTGCCCTGGCCGCGCTCTGAGGACGGGGGCCCCC 

t i i ■ * ■ I 1 i ' 1190 

ACCTTCTGCACCAGCTGGAGTTCCGGCTCGACCGCGCACCCGACCCGCGCGAGACTCCTGCCCCCGGGGG 

LEO VVDLK AOLAR GLAAL 
_______ OHV 1 



GTTCCTGCCGCGAACGCCAGCGGCCGGGGCTTCCGGCGGGCCTTTGCGCGATCAGCAGCTAGTCTTGGGG 

, i i i ! i ■ i i ♦ ' 1260 

CAACGACGCCGCTTCCCGTCCCCGCCCCCCAACCCCGCCCGGAAACGCGCTAGTCGrCGATCACAACCCC 

» i 



3*5 
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?/4< 

AAACC TCCTACCCC AGGAGCT ACCCCATGAA^.T^ATCCTTTTCCTGATCATCSGCGCCGT'TCCCGGCTG 

tttgcag6atcgggtcctcga7cgggtacttggagtaggaaaaggactagtagccgcggcaacgsccgac 

gatcgccggcaagttgctgcgtggtggcgccttcgggctgat-c^gcaacctcgtgcrggccatagtgcgc 
ctagcgaccgttcaacgacgcaccaccgccgaagcccgactagccg7tggaccaccaccc3:atcacc:3 

gcggtgatcggcgcccacct3rtcagctacctggccgrg7c:gccggtggtgggctgatcgg:::gc:gg 
ccccacragccgcccgtgcacaagrcgatggacccgcacaggcggccaccacccgactacc:gaccgacc 

tgaccgcgctgarccgtgccctggrcctgctgttcatcgtcggcctgatcaagaacgcccagtagcgctg 
actggcgccactagccacgcgaccaggacgacaagtagcagccggactagttcttccgggtcatcgc-a: 



1330 



i<*70 



tsao 



GCGCCACGCCGTCCCGCCGC'-ATCACTGGTCjCGCAGGTCCACGGCACCGGCGCCGGGT" j'CGAACi 

i i ■ i » »610 

CGCCCTGCGGCAGGGCGGCGGGTAGTGACCAGwGCGTCCAGGTGCCGT'GGCCGCGGCCCaaaca ZZ'' Z' 

4 II ^ M 003l_0VAGAG :a '9 r _ 
I XcpQ — ^— — — 



GGCCCTCGGCGCTGCCCGGCAGGCTCCrCTGGCZATCCrCj'CGGCACCZAGCACSCT^A'^'l^CTG'a 

l . ■ i 1 i ► 1680 

CCCCCAGCCGCGACGGGCC"CCGACGACACC 3G7AGCAGC AGCCGTGGGTCG^GCSiC ~AZAjI3~CA~ 

reasgp_ssh:oe3aglv5 :2s* 



CTTCTT^CCCGACAGC jCGjCCATGCCGGCGC ^GTCGCGGACGAT jGTC jdGCGCAGGi AI-ZrATC- 

, . . i "50 

GAAGAACGGGC'^rCGCGCCGGTACGGCCGCGCCAGCGCCTGCTACCASCCCGCGTrCT'^VSCrAG:;; 

KKG5LAAMCA3D3V ! T P R L - - M 



TTGCGCTTGACCTGGGrGT;:TTGGTCGAGCGGAACAGCCGGCCGATCAGCGGGA:G::-:;C-CCA::i 

■ . , . . i ■ »• 1820 

AACGCGAAC:GCACCCACAGGAACCACCrCGCCrTGTCGGCCGGCTAGTCGCCCTACAG733GT:G7:GC 

NRK VHTOK TSRFLRCILP 10 G LLP 

gcaccttggagtcggtgctggtgacgtcgtcctggatcagccctcccagcactatgacctggccgtcgt: 

... h ■ 1 ■ ■ ► 1890 

CGTGCAACCrCAGCCACGACZACTGCACCACCACCTAGTCGGGAGGGTCGTCArACTGGACCGCCAGCAG 

VKS3TST7DD 0 Xa ) Q IGGLV I V 3 G 0 3 



GGCCAGGATCACGCTCTTGATCGACCGCTTGTTGGTCACCAGGTCCACCGCCTGCGCATrGACCCCGGCG 
_ — » . i — i ■ ► i960 

CCGCTCCTAGTGCGAGAACTAGCTCGCCAACAACCAGTGGTCCAGGTGGCGGACCCGTAACrGGGGCCG: 
A L 1 V S (C ISRKNTVLDVAOANVGA 



3 AC 
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2030 



CTGC6C6CGA7^SSACATCTCCTCCTCCA <.iTCCAGCCGCAGCCTCSCCCCCTCSTTGArC73C3SGC 
GACCCCCCCTiCCTCCTCTACAGGACGAGGTGAAGGTCCGCGTCCCACCGCGCCAGCAACTACACGCCC: 

SPA t5SlEQEVEL»LTAGDNlHPT 

7CACC77GAGGG::ACGCCGA t GTCCTCGCGCTCAATGC7GGTGAAGGGGTTG77CGCCCCCGAGGCG7C 

i — — h 2100 

ACTGCAAC7CCCAC7CCCGC7ACA&CAGCCCCAG77ACCACCACT7CCCCAACAAGCGGGCCC7CCGCAC 

VKL r VG!DERE I TTFPNNA GSAO 
- — xa*> 



GGTCCTG7AGGAGCCGCTCTCGAAAGCCACG7TCTGCCCCACCACCATTTCCCCCTCC:GCr7GTCCACG 

■ < — 1 1- 2170 

CCACCACATCCTCGGCCACACCT77CCCTGCAAGACGGCCTCG7CCTAAAGGCCGACGACCAACAGGTCC 

TTtSGTOFPVNQaVLt E A € 0 N 0 L 



GTCAGCACGC7G^CG7£GACAGCACG7TGC7C7TGC7GT7GGCAGAGAGGCCAG7GArCAGC3CGCCGA 

■ , ■ ■■, * ^ « 1 ' > ► 2240 

CAC7CCTCCGAC::GCACC:37CGTCCAACGAGAACGACAACCC7CTCTCCCGTCACTAG7CGCGCGCCT 

TLL S 3 7 S LNSKSNASLA T ! L - G T 



AG77C7CCG7G:"GA7GCCGATGA7GCCGCCC7CCGGCAGGG7CACG7CArcCGCGA777CC7CG77C7G 

i L~ , >■ 1 ► 2310 

7CAAGACCCACGGZ 7ACGGC7AC 7ACCGCGGC AGGCCG7CCCAG7CC AGTAGCCCC 7AAAGGAGCAAGAC 

N E 7 3 I G I IAGOPL7L00 0 : E ^ N 3 

- ■- *T*% M I ■ I .1.111 



CATGCCC77GAGCACGG7GC:CACCGA7AGCCCGG'A77GCCGAAG77GACCCCGCCCAGC::a::;^:G 

, — - 1 ■ 1 < 1 1- 2360 

C7ACCGCAAC7:G7GCCACGGG7GGC7A7CGGGCCA7AACGGC77CAAC7CCGGCGGC'C:iGCGGCCAC 

IAKL/TGV5LG TNCFNVGGLi Z ■ 
■ — ■ XceO 



CCGCCGCGGGCArCCACCGCCCAC7CCACGCCGAGGGCG7CGCTGA7G7CC:CGGAGA:7';ciCGATGG 

, . , . 1 - • i 1 ■ ► 2450 

CGCGCCGCCCG7AGG7GGC3GG7GACG7GCGGC7CCCGCAGCCAC7ACACGGCCC7C7AAAGG7GC7ACC 

GGRAOVAWOVGLADSIOGSIEVI A 
. XcpQ 

CCCCCTCGACCA7CACC7GGGCGCGCGCCACG7CGAGG77GCGCACGA777CC7CGAGCG7CGCCACGG7 

, i i i ' i 1 > » ' ■ ► 2520 

GCCGGAGC7GG7AC7CGACCCCCGCGCCG7CCAGC7CCAACGCC7GC7AAAGGAGC7CCCAGCGG7GCCA 

AEV1VQARPVDLNRVIEEL TAv7 

XcpQ 



CTCCGGA7CGG:CAGCACCACCAGGGCA77GAGGC7C7CG7CCCCGCGGATCAGGA7G77C7GCGCC77G 

> . i ■ ► 2590 

CAGGCC7AGCCGG7CG7C::GG7CCCG7AAC7CCGAGAGCAGCCGCCCC7AG7CC7ACAAGA:GCCGAAC 

DPOALLVLANLSEOAR I L I N 0 P < 

XcpQ 



3 AD 



WO 98/06836 PCTYUS97/14450 

CTCCTGCCCCCTTCCCCACCACCCTCCGCC TTCAACCCCTCGGACATGTCCCCCAGGGTCrCGGCCA 

i i 1 i ■ ■ > 2b«u 

gaccaccgccgaagccc:gg:gggaggcgccagaagttggggagcctctacagcgggtc::«gagccggt 

SSAAEGGGEATKL GESIOGL'EAL 

"T^ _ 

gcctcttggcgtcgctgtggcgtaggcgaattacccgcgcat;ggccgaacgggtgctggggatgtccag 

i 1 > 2730 

CCGAGAACCGCAGCGACACCGCATCCGCTTAArGGGCGCGTAACCGGCTTCCCCACGACCCC 7ACAGGTC 

5KA0SHRLR 1 VRANASRTSP1DL 



CGAGCGGGCCAGGTTGGCCAGGCGCTGGCGCGCGGCCGGCGGCCCGAGGAGGATCACGCG3TTGGTCCGG 

i ' — : — i ' t 2800 

CCTCCCCCCGTCCAACCGCTCCGCGACCCCCCGCCGGCCGCCCGGCTCCTCCTAGTCCGCCAACCACCCC 

SRALWALRORAAPPGLL I L R * T R 



GCGTCGGCAATCACCCCGGTCCCGGCCCTCTTTTTCrCGTTGCGCArCACCGCGTTG^TCiGTGCCTCSG 

• i 1 . 1 1 ■ 1- 2870 

CGCAGCCGTT'AG TGGGCCCACGGCCGCGACAAAAAGAGCAACG'G TAGTGGCGCAACAAG TCACGGAjCl 

A 0 A I V R » T G ASNKE N => 1 V A N * _ i £ i 
XcpQ 



CGCCGTCCAGTACCCAGGCATGCTGCAGGrTGATCACGTTGTAGTCGCCCCCCCCC^G^GCi^IGAGCr; 

, , ■ i ■ i ' ■ ■ > > 2940 

GCCGCAGGrCATGGGTCCGTACGACGTCCAAC7AGTGCAACATCAGCCGCGGCaGGA:c:^'-3:TCGA3 



A D L wAHOL N I vnv DG 
XcpQ 



GCCGATCAGTTCGCG&ATGCGTTCGATATTNGCCCGCCGGTCG;:GATGA:CAGCG:iT*3a-SGC3G:3 

i 1 ■ > i 3010 

ccgctagtcaagcgcctacgcaagctataancgggccg:cagciactactagt:gc3c--::"::gc::: 

A iL'tlRE I N A 9 =? D S I t.A\ = 
— XcpQ 



ACCGCCGCC-GGTCGCCGTTCTGCGGCACCAGCGGGCGGATCAGCGCCA7CAGT73GT*' j-C C iAGGT I" 

, i ' ' ' »- 3080 

TGGCGGCGGrCCACCGGCAAGACGCCGTGGTCGCCCGCCTAGT;GCCCTAGTCAAGCAAC':G:TCCACA 

VAALHGMOPVLPR I L P 1 L E N ; 5 T H 
— — XcpQ 

GCTGCACCTGGATCAGCTCGGTCTGCACATCGTCCGCCGCGCTGCGGCTGCTGTrGGC^CCGCTACGCGC 

i i 1 i i i • »- 3150 

CGACGTGGACCTAGTCGAGCCAGACGTGTAGCAGGCCGCGCGACGCCCACGACAACCGCGGCCATGCGCG 

OVQl LETOV00PASRS5NA3SRA 

XcpQ 



CTCGCTGACCGGCACGATGCGCCCCTGGTCCCCCTGTGCCAGCACGCTGAAGCCATGGGTGCrCATCAC: 

, . i ' ' ► 3220 

GACCCACTGGCCGTGCTACGCGCCGACCACCGGGACACGGTCGrGCCACTTCGGTACCCACGAGrAGTGG 

ETVPV I RAODGQAL VSFGH'SM.' 
— . — XcpQ 



WO 98/06836 



PCT/US97/14450 



3290 



\0/£6 

GAAACGAACAGCrGGTASACCTCCTCCACCCCCAGCGCGCrCTTGCAGATCACCGTGACCTGCCCCTTGA 
CTTTCCTTGTCGACCATCTiGAGGAGCTCCGGGTCGCCCCiGAACCTCrACTGGCACTGCACCGGGAACT 

SLPLQY VtELGL P T K S t V TvOG K V 

*^Tk i, i i.-—— — — ■ . ■■ — ... ■ - ■ ■ 

CCCGCGGATCGACGACGAia=TCTCGCCAGAGATCrCCGCCACC7GGTCGAT3AAGTCGC3GATATCGGC 

I ,i i 3360 

CCGCGCCTAGCTGCTGCTTCCAGAGCGGTCTCTAGACGCGGTGGACCAGCTACTTCACCGCCTATACCCC 

RPDVVFTECStOAVOOtFDR (OA 



CrCCTTCArCTTGATGGTCCACGTCTCGCCCCCCTCCCTCACCGCCACCGCCTCGGCCGCATGGACGAGC 

, i i 1 ■ l ■ ■ ■ t — — ► 3430 

CAGGAAGTACAACTACCAGGTCCAGACCCGCGCCACCCACTGGCGGTGCCCGAGCCGCCCTACCTGCTCG 

D K fl N I TWTEAGOS VAVPEAAHVL 

GGCAGCGGCGCGGCGAGGCioCTCGCGGCCAGCAGCAGGGCGAGGGGCACCCGTTT3TGCGSC3GAATT: 

, 1 . ' " 3500 

CCGTCGCCCCGC:GCTCCG"GAGCCCCGGTCGTCGTCCC^CTCCCCGTCCGCAAA:ACGC:::;^rAA3 

?LPAAL"-SAALLw AL : >L ( »KH OIJ ! 9 
_ M ___^_^^^_ — XcpO — — — — 

TC6AGTCGATCATGCGC"j~"TC6GCT7CCGCTATTTCGG3CTGCGGGATGTCGC-GCCTTCC A'GCGT 

, , i 1 I 1 1 i ' 357C 

ACCTCAGCTAGrACCCGACiGAAGCCGAAGGCCArAAAGCCCGACGCCCTACAGC!*GC3GAiGGTACGCi 

S D I M . 
X qj O 1 

PSZEAEOIE^Q^ID^:^- 3 
1 . XcpP 



tgtigaagggtctgga7j::ctcctgcagggcc7ggac3*zttcgtzctgca3C3*'::-g*"^:*ss 

, . ■ f 3640 

ACAACTTcccxGACCTA:^:^GGACcrcccGGACcrGCAGAAGCAGGAca::s«:--^i*:-i::^:: 



ool to I - E olaqv o e d o l 



cggtgggctc:agcgcc3-jT"aggccggcgtcagagagggctggcgcaccgcggggaa3C3cagg:":*: 

i 1 ' i 3710 

GCCACCCGAGGTCGCGGC7CATCCCCCCGCAGTCTCTCCCGACCGCGTGCCGCCCCTTCacar:C2AGAC 

TP£LA3rAPTL5P03VAP-*'_ = E 



CTCGACCCCCCCGCGGTCGAGCACCACGTCGTCCTGATAGACGGCCTGCAGGCGGGTGCTGACGTTGACC 

, ■ » 1 ... i 3780 

GAGCTGCGGCGGCGCCAGC'CGTCCTGCACCAGCACTATCTGCCGGACGTCCGCCCACGACTGCAACT 



EVCGRD -VVMOOYVAOL* 



GATTCGCCCACGGCGATa::CTTGGGTTTGTCGCCGGCGACCTGGATGATCGCCGTGGAGCGCT*3GCS* 

. ■ k 3850 

CTAAGCGGCTGCCGC7A:GCGAACCCAAACAGCCCCCGCTCGACCTACTAGCGGCACC7CGCCAACCGCA 

S E G V A I = K PKOGA VOI I A T 5 P A Q 

, i. XcpP 



WO 98/06836 
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rCC&CTTSA CGAAGCrCGCCAGCACGCTCA.. . GCTGCCGGGTGGCCCCGGCCGCCTGCTCGCCGCCCGC 
GGCCCAACTGCTTCGACCGi;:GTCCCiGTAGACGACGGCCCACC3CCCCCGCC3^ACCAGCGGCGCSC: 

PNVF SAULTMQORT APAAO 0 C R P 



CCTGGCCGCGGGCGTCCCGAACACATGCTGCAGGCGCTGGArCGACAGCGG CTGGCGCTCGCCGATGCTC 

, . , I i — - « ■ ■ »■! i » * 3990 

GCACCGGCGCCCGCACCGCTTGTCTACGACGTCCGCGACCTACCTGTCGCCCACCGCGAGCCGCTACCAG 

RAAPTGFLMOLROISLPO R £ A I 5 

W *T*° ' " " "" "" " ' 

T CTGGGGCGGGCGGTCCCGCGGCCTCGC TGCCCAGC AGGCGAAGGAAGTCGATGCTC TGC TTGC TC AGGC 
AGACCCCGCCCGCCACCCCGCCGGAGCCACGCGTCCTCCGCTTCCTTCAGCTACGAGACGAACGAGTCCG 

EPAPPPAAESRLLRLFOI S Q < 5 L S 

— "t" 

TGAGG6TGA>'GAGCACCACCACGAGC AGGC AGAGGCCGGyCACGCZGTCGCGCT-jCAGCC AGGC GGGC AG ^ ^ 
ACTCCCACTACTCGTCCTGGrGCTCGTCCGTC T"CCGGCCAGTGCGGCACCCCGACGrCGGTCCGCCCGTC 

i T I t L VrfLLCLGTVG«aO--*A a L 

XcoP 



GCGCCTGCGGG7CCTACTCAAGGCA7GGrTCCCCCGG:G7TCrr:rT-flTTCT^-^CGGA :GC:c:GC7C: 

- i t • • — ...» i . . . > i ... .... ■ t -1200 

CGCCCACGCCCACGATGAGTTCC6TACCAAGGGGGCCACAAGAAGAA IAAGACACGCC "GCGAGACGAGC 

1 I 

R T R T S S _ . 

XcpP 1 



GCGTCrCGCAATCCGGCCCG:ACTCTGCGGGCGCAGGCAACCT'^ACGCAAGT:'::'S*::.X"G:s:^ 
CGCAGAGCGT:AGGCCGGG:ATGAGACGCCCGCGT;:GT7GGAArTGCGTrC-G-G^-:- .» 

CCTGCTTCGTC:ATCTGCG:GCTGCCCCACrGTCCGCCG:rGC:GGAAGCG"3---C-""":^A«A 



4340 



4410 



GGACGAAGCAGArAGACGCGCGACCGCGTGACAGGCGGCGACGGCCTTCGCAC'TTGTiiASC 
CCGCCAACGAGTCCCTArCATCCG CCCCACGCGCTTCCCGTTCAACAATAGCAArAAGCCAGACGGArTA 

gccgcttcctcagcgatagtagccggggtgcgcgaagggcaac:tgttatcgttatt:gg:ct3cc:aat 

I 1 

CCGCCATGGAAGATCGCAAGCCGCCTGCCGCGGCTCCCGTGGGGTTTGCCCGCGCGGAGCTGCTGGAGCT 

. t t i i i ' 1 * 4480 

ggcggtaccttctagcctt:ggcggacggcgccgagggcaccccaaacgcgcgcgc::cgacgacct-:ga 

1 " ™* 



WO 98/06836 PCT/US97/14450 

12/4* 

^CTCTCCCCCT6C3AGCACTTTCCCCT SACCCCCrcCTCCCGCCCSCCGCTTCCGGCAAGTCCACCCTG 
C6AGACGGCGACGCTCGTCAAAGGGGACTGGGACGACGACCGCGGGCGGCCAAGGCCGTTCAGC TGGCAC 

LCRCEQ'PL TLLLAPAGSGKSTL 

***** 



4550 



CTGCCCCAGTGGrAGGCCAGCCGGCCCTTCGGCAGTGTGGTGCACTATCCA.CTGCAGGCGCCTGACAACG 

» » 4620 

CACCCGGTCACCGTCCGGTCGGCCGGGAAGCCGTCACACCACGTGATAGGTGACCTCCGCGCACTGTTGC 

IAOW0A3RPFCSVVHYPL0ARON 

. ofv 



ACCCCCTACGCTTCTTCCGCCACCTCGCCGAAAGCATCCGCGCCCACGTCGAGGACTTCGACCTGTCCTG 

m , t » 4690 

TCGGCCATGCGAAGAACGCGGTGGACCGGCTTTCGTAGCCGCGCGTCCAGCTCCTGAAGCTGGACAGGAC 

EPVR-PRHLAESI RAOVEOFDlSW 
- OrfV 

GTTCAACCCCTTCGCCCXCGAGATGCACCAGGCGCCCGAGGTGCTCGGCGAGrACCTGGCCGACGCCCTC 

I i 1 ' ' ► 4760 

CAACTTCGGGAAGCGCCCGC:CTACGTGCTCCGCGGGCTCCACGAGCCGCrCA7GGACCCGC:GCGGGAG 

FNP-AAE^HOAPEVLGEyLAD&L 
>OfV 



AATCGCATCGAGAGCCGCCTCrACCTCGTCCTCGACGACTTCCACTGCATCGGCCiGCCCATCATCCTCG 

■ ■ ■ i i i 1 1 1 — < ■ « ► 4830 

TTACCGTAGCTCTCGGCGGAGATGGAGCAGGAGCrGCTGAAGGTCACGTAGCCGGTCGGC ■ AGfAGGACC 

nri£5Rlylvloqfoc i s o p : ; 1 

**** 



ACGTGCTCrCGGCC ATGCTCGAACGCCTGGCGGGCAACACCCGGGTCATTCTGTCCGGGCGC AACC ATCC 

_ , ,).,■! - ► 4900 

tccacgagagccsgtacgajCttgcggaccgcccgttgtgggcccagtaagacag3ccc::g:tggtagg 

OVt SAUL £ R L A G N T R v ! 3 G => *. * » 

^* — 



5040 



GGCGTTCrCCrTCAGCCGCCTGAAACTGCACAACAAGCTGCTGTGCATCGACCAGCACGACArGCGCCTG 

, i > ' i ■ ► 4970 

CCCCAACAGGGAGTCGGCGGACTTTGACCTGTTCTTCGACGACACGTACCrCGTCGTGCrSTACGCGGAC 

GF5LSRLKL0NKLLC I0OH0F1RL 
—OrfV 

TCCCCAGTGCAGATCCAACACCTCAATGCCTACCTCGGCGGTCCCGAGCrCAGCCCGGCCTATGTCGGCA 
AGCCGTCACGTCTAGGTTGTGGAGTTACGGATCCACCCCCCAGGGCTCGAGTCCGCCCGGATACAGCCGT 

SPVStOHLNAY LGGPELSPA* VG 

OrfV 

GCCTCATGGCCArGACCGAGGGCTGGATGGTCGCGGTGAAGATGGCCCrGATGGCCCATCCGCGCTTCGG 
CCGACTACCGGTACTGGCTCCCGACCTACCAGCCCCACTTCTACCGCCACTACCGGGTACGCGCGAAGCC 

SLItAMTEGWnVCVKIIALrlAMARFG 

OrfV 



5110 



WO 98/06836 



ISA' 



PCTAJS97/14450 



CACCGAGCCCCTGCAGCGCTTCGGTGGCGG .7CCGGAGATACTCCACTACTTCGGCC ATGrGGTGCTG 

; i I ' 1 ■ I lil 5180 

' CTGGCrcCGGSACGTCGCGAAGCCACCGCCGCTACSCCrCTArCAGCTaATGAACCCGGTACACCACaAC 

TEAL QBFGGGHPE I VDYFGHVVL 

Ofv 



AACAAGCTCTCSCCCCA6CTGCACCACTTCCTGTTGTCCAGC6CGATCTTCGACCGCTTCGACG3C3AGC 

■ ■ r ' ■ K 5250 

TTCTTCGACAGCGCCGTCCACCTCCTGAAGGACAACACGTCGCCCTAGAAGCTCGCGAAGCTGCCGCTCG 

(CKLSPOLHOFLLCSAI F E R F 0 G E 



TATCCCACCGGGrGCTGGATCGCACCGGTTCCGCCCTGCTGCTGCACCACCTGGCCGCGCGCGAGCTGTT 

i ii i i i i i 1 . 1 ► 5320 

ATACGCTGGCCCACGACCTAGCGTCCCCAAGCCGGCACGACGACCTCCTGGACCGGCGCGCGCTCGACAA 

lcorvlorsgsallledl AAR^l ^ 

rs^j ... 



CATGCTGCC3G73GACGAG7ATCCCGGC7GCTACCGCTACCACGCCCTG77GCACGA7 7TCC7C3CCCGG 

. i ■ 1 * ■ ' 5390 

CTACGAC3GCCACC7GCTCATAGGGCCGACGA7GGCGATGG7GCGGGACAACG7GCTAAAGGAGZGGGCI! 

H L P V Dt^PG Z Yff^HAL L H D r _ A ? 
^ OrfV 



CGCCTGGCCGTGC ACAACCC ACAGGAAGTGGCGCAAC TGC ACCGGCGGGCGGCCC 73GCGC 7GZAGCAGC 

, , i ' 1 t * ► 5460 

6C6CACCGGCAC3TGrTCGGTGTCCTTCACCGCGTTGACGTCGCCGCCCCCCGCGACCGCGAC-TCG7CG 

rlavm<30£vaolh sr a a l a l CD 



GTGGCGACC7GGAGCTGGrCCTGCAGCATGCCCAGCGCAG7GGCGACCCCGCGr7S:'::-aiSCi7GCT 

, i , — — til - i 5530 

CACCGCTGCa;CTCGACCGGGACGTCG7ACGGGTCGCG7CACCGC73GCGCGCAAC AAGG7 " ; 37 ACGA 



R C D * £L*iUOMAO^ 5S0RA 



CCGCGAGGCZ:GCGAGCAATGGG7GCGCACCGG7CACTTCGCtGAGCTGCTCAAGTaGC*G3AGC;3C7a 

, — i 1 ■ ■ 5600 

ccccctccggacgctcgttacccacgcgtccccagtgaagcggc7CCacgactt:a:cgacc:cggcgac 
geaceowvrsghfaevlt(wl£^l 



accgaggcggaactctgcgngcagtcccccctgctggtgctgatgacctatcccctgaccc:g7cgcggc 

, , * 1 ► 5670 

tcgctccgczttgagacgcncctcagcgcggacgaccaccactactgcatacgggactgggacagcgccg 

SEAELC^OSRL LVL?tT Y A L r L. 5R 



GTTTCCACCAGCCGCGCTACTGCTTGCACGAAC7CGTCGCGCGC7GCACCGGTCAGCCGCCCC:GGaGGA 

■ »■ » ' ■ ► 5740 

caaacgtgg:ccgcgcgatgacgaacctgcttgaccaccgcccgacgtcgccagtcggcccggacc:c:t 
rfmqar'cloelvarctgopgle e 

OrfV 
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gCCCACCCCCCASCTGCTGGCSCTCAACCTGUAGCTGTTCCAGCACGACCTGGCC TTCGACCCCGGCCAC 

w , [ i i - - \ 11 ■ 1 -* ■ SB 10 

CGGCTGGGCGGTCGACGACCGCGAGTTGGACCTCGACAAGCTCG"CTGGACCGGAAGCT^3GGCCGGTC 

ptrollalmlelpoholaf d p g q 

***** 

CGCTGGTCCGACCTGCTCGCCGCGGGCGTCGCCTCGGACA TCCGTaCCCTGGCGCrGAGCATCCTCGCCT 

( | i | - * ■ — 1 — ..I — ■ ■ *- »■■*■■■-*■' ■ t 5880 

GCGACCAGGCTGCACGACCGGCGCCCCCACCCGACCCTGTACGCACGGGACCGCGACTCGTAGGACCGGA 

RWSOLL A A G V A SO I R A L A L S I L A 

rto ** 



ATCACCACCrGATGCACCGCCCCCTGGACCAGTCGATCCAGCTGCCGCTGGAGGCCAAGGCGCTGCTCGC 

, t ■ i ■ I 5950 

TAGTGGTGGACTACGTGCCGCCGGACCTCGTCAGCTAGGrCGACCGCGACCTCCGGTTCCGCGACGACCG 

YMHtnHGRLEOSI OLALEAKALL* 



CACCACCCGCCAGCTGTTCCrCGAGAGCrACGCCGACCTGATCA7CGCCCTGTGCAACCGCAACGCCGGG 

, - , i ' 1 ' 5020 

CTCGTGCCCGGTCGACAAGGACCTCTCCATGCGGCTGGACTAGTiGCGGGACACGTTGGCGT3C-aCC: 

STGGLFLESVAOLt lALCN^NAi 

**** ' 



CCCCCCACCAGCGCGCGCAAGGACGTCTGCCTGGATTACCAGCG:ACCGAGCGC:C CTC3C:as:CT33S 

, , i , i i ■ * * 1 1 * * * 1 ■ ' ' "' * * 6090 

CCCCG6TCGTCGCGCGCGTTCCTGCACACGGACCTAATGGTCGCGTCGCTCGC jAGGAGI jGCCGGACCC 

RAT5ARK0VCL Q^ CR'ERSS 3 A * 
Of* — 

tcaaccgtgccaccgccatggtggtggcgctgtacgagcagaac:agc:ggcc3c:ccc:»gc-3:t^-^ 

, r , i - t -- ■ ■ 1 — * 6 1 60 

AGTTGCCACGGTjGCGGTACCACCACCGCGACATGCTC3'CTT3STCGACCG3I ZZZZ 1 "I"! j-^-^ 

VNRA7A^yVALY£0(4j LAiA:-_ 1 

CGAGGACCTGATGGCCATGGTCACCTCC7CCrCGGC:ACCGAGA:CA7CCC:ACCCTSCA:A:CA;c:*j 

, 1 ' i 6230 

GCTCCTGGACTiCCGGTACCAGTGCAGCAGGAGCCGGrGGC7C73GTAGCGCT'GCCACG7S7iGrSGGA; 

EOtHAHVTSSSATETIAiVH. I TL 

tcgcgcctgctccaccggccccagtcccacggccgcgccacgcgcctgctcgagcagctgtcgcgcat:: 

! ... i 1 1 * 6300 

agcgcggacgagctggccgcggrcacggtcccggcgcggtgcgcggaccacctcstccacagcgcgragg 

srllhrrosocra.trlleolsr : 

— 

rGCAACTCGGCAACTACGCCCGCTTCGCCAGCCAGGCGGCGCAGGAGAGCATGCCCCAGGCCTArCTCGA 

. 1 ► 6370 

ACGTTGACCCGrTGATGCGGGCGAACCGGTCGGTCCGCCGCGTC-TCTCGTACGCGGTC- jGA'AGA^CT 

l q l gmyarfasoaagesmroay l : 

Ofv— — — . _ _ 



1* 



WO 98/06836 PCT/US97/14450 

CGGCCCCCCCGCGGCGCTCGACGCACTCGCU .ACCCCTGCGTArtGACGAGCCCCTGGCCSCCGGGGAC 

t , i * 1 1 ill 6*4U0 

GCCCGCGGGCCGCCGCGAGCTGCGTGACCaGGTTGCGGACCCATAGCTCCTCGCSGACCGGCGCCCCCTC 

GRPAALDAU AORLG lEERtAACt 

OrfV 



TCGGAGAGGGTGCGGCCCT-rGAACAGTGCTGGGAACGCTA CGGCCTGGCCGCCGTGrACTGGCT'GGTGA 

, | i ■ - ■ ' " 1 * * * 65 10 

ACCCTCTCCCACGCCGGGATACTTCTCACGACCCTTGCGATGCCCGACCGGCGCCACATGACCGACCACT 

w £ R V R P / EECWERYGL AAVYWLV 



TGCCCGCCCCCCAGCCGCCCGCCTGCCCCATCCTCAAGGTGCTCGCGCAGGCGMTGNAGAACAGCGAGAT 

, . — 1 * 6580 

ACGCGCCGCGGGTCCCCGCGCGGACGGCGrAGGAGTTCCACGACCGCGTCCGCMACNTCTTGrCGCTCTA 

MRGAQPRACRILK V L A Q A t f U 5 £ *1 

gaagccccg:gcgctggtgg:gcaggccaacctgc:ggtgc:gaacgccccccacc:gggggcggacgag 

t — i ; ■ " 6650 

CTTCCGGGCACiCGACCACCA;CTCCGGTTGGACGACCACGACrTGCGGGGCGTCC;A:c:::sCZ:GCT: 

K A R A L V / E A N L i. V - N A P Q _ Z £ Q i 



CAGGACACGGCCCTGCTGGCGCTGGTCGAGCGCTTCGGCA7CGTCAACATCAACC^:*:3GTAT7CGAC3 

i 1 i i i , i i i 6 7 20 

GTCCTGTCCCGGGACGACCGCGACCAGCTCGCGAACCCG'AGCAGTTCrAGTTGGCGAGCCA'i 

0DR*LLALVE»FGlVi*IN33vf3 

ON 



AGGCGCCCCCC'TCGCCGAGGCGGTGTTCGGCCTGCTGCGCrCGGGCCSGCTGCAGGZSH GGAGG"*- 

i - — -t— * 5790 

tccgccggccgaagcggct:; gccacaagccggacgacgcgagcccggccga;:g""~::;g::: *zzz ga~ 
eapj"a»avfgll ~ s g p l j a a =: - - 



tcgccaggcc- ATGCCGA:-*:c:ccAGGGCiCAGGCCASs:acccc:cacGCTcc -3^ 

i ' 6860 

AGCGCTCCCGArACGGCTGAAGGACGTCCCGTG7CCGG"-GCGGCGCCCGCGAGGAC AGGC7C AGC3AC 

REAVAOrLOGT-COAPPALL 3 E 5 L 



AAACAGCTTACCGACAAGGAGGCGGCGATCTTCGCCTGCCTGCTCAGGCGGCTGTCCAACAGCGAGA7CA 

■ ■ , i , > ■ i ' 1 ■ ' ► 6930 

TTTGTCGAATGGCTGTTCCTCCGCCGCTAGAAGCGGACGGACGAGTCCCCCGACAGGrTGTCGCTCTAGT 

KOLTDKIAAI f ACLLRGLSN5E [ 

nifi* ■■ 



CCCCCAGCACCGGCATCGCCCTGTCCACCACCAAGTGGCACCTGAAGAACATCTACTCGAAGCTGAGCCT 

: , ~- 1 i ' t i 7000 

CGCGGTCGTGGCCGTAGCGGGACAGGTGCTGGTTCACCGTGGACTTCrTGiAGATGAGC TTCGAC TCGGA 

SASTGlALSTTKWrtLKH I T S < L 5 L 
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CTCCCGuCGTACCaAACCCATCCTCGCCATGCAGCCCCCCAACCGATAATGCGCCATGCCCCTCCCCGCG 

, i 1 ■ ■ 7070 

GACCCCCGCATGGCTTCGGTACGAGCGGTACGTCCGGGCGT7GCCTATTACGCCCTACGGGGAGGGGCCC 



SGWrEAlLArtQARNG 

Hrf ■ ■ ■ 



GACGGCGGAGGGGCGCGCGCAACTGCTTAATCTCCCGCCrGCCGGAAAACCCGCCAAGCAACCCCATTAG 

i i i I 7U0 

CTCCCCCCTCCCCGCGCGCGTTCACGAATTAGACGGCGGACGGCCTTTTCGGCCGTTCGTTGGGGTAATC 

4 *- 

TACAAGAAGAAATCGGGAGATATCGCCATGTCTGrTTGGGTCACGTCGCCGGGCTT GGTCAAGrTCGGCA 
ATCTTCTTCTTTAGCCCTCTATAGCGCTACAGACAAACCCAGTGCACCGGCCCGAACCAGTTCAACCCGT 



72tO 



HSVVVTVPGLVKF 

I ow 



CCCTGGGCATCTATCCCGCCCrCATCACGCTCGCGCTTGAGCGCGACGTCCTGTTCAAGAACAACCrGTT 

i | i ■ ■ ■ h 7280 

GGGACCtGTAGATACGGCCGGACTACTCCGAGCGCGAACTCGCGCTGCACCACAAGTTCTT'GTTGGACAA 

TLG IYAGL (TLA L E R D V L FKNMl r 



-OffX- 



CGACGTCCACAACCTGCCCGCGGCCAACGCCAGCATCACCTCTGATGCCCGCAGCCAGGTaaC^C^-ACC 

, , . i 1 ■ i • K 7350 

GCTGCAGCrGTTGGACGGGCGCCGGTTGCGGTCGTAGTGGACACTACGGGCGTCGGTCCACCGCGCATGG 

DVO^LPAAMA5!TC 0ARSO '/AR r 

OrfX- 



GACGACGCCAC::GTAACATCCTCGCCAACCCGGCCGA GGGCTCGGTGTACCGC:3CT:C3CGC^CAACa 
CTCCTCCCGTGGACArTGTA3GAGCCGTTGG3CC3aCTCCCGAGCCACA73GCGCCGAi3C::SC^TT^C 

F 0 G T r n iLANP A E G S V Y => Q T a 3 *l 



7420 



TCCACCCCAGCGT"GACCCA"GGCGAGACCGAGGCCGACACCCTCC TCACTCCCAATCCGC ^GGAGGr^AG 

. ■ ■ « »- 7490 

agctgggctcgcactgggtaccgctctcgctccggc7gtggcacgagtcagcgttagg:gccctccactc 

VOPSVTMGETEAOTLLSPNPflEV S 

■ — —OrfX 

taacgtgctgatggcccgtggccagttcaagcccgcgcccagcctcaacttcatcgccgcctcctggatc 

i , — r- . — t 1 i 1 1 ■ i 7560 

ATTGCACGACTACCGCGCACCGCTCAAGTTCGGCCGCGGGTCGGAGTTGAAGTAGCGGCGGAGGACCTAG 

NVLMA3CEFKPAPSLNFIAASVI 

™» 

CACTTCATGGTGCATGACTGGGTCGAACACGGCCCCAACGCCGAAGCCAACCCGATCCAGGTGCCGCTGC 

i — i ' 1 ■ • ■ 1- 7630 

GTCAAGTACCACGTACTGACCCAGCTTCTCCCGCGGTTGCGGCTTCGGTTGGGCTAGGTCCACGCCCACG 

OFHVHOWVEHGPNAEANPIOVPL 
, Oft 



3AL. 



W 98/06836 



PCT/US97/14450 



CGCC7GGCGACGCGC7CGCCTCCCCCAGCC . . . C ICirCCGCCGC ACCC AGCCCGACCCCACCC^TACCCC 

, , i ■ ■ * ii> 7/00 

GCCGACCGC7GCGCGAGCCGAGGCCC7CGGACAGCCACGCGGCG7GGG7CGG5C7GGCCrGGGCA7GGGG 

PAGOALGSGSLSVRRTOPOP T 3 T ? 

GGCCCAGGCCGGCAAGCCGGC:aCC7ACCGCAACCAC^ACACCCAC:GG7GGGATGGCTCGC^GT7G7AT 
CCGCC7CCGGCCGTTCGGCCGGTGCATGGCGTTGGTC7TGTGGG7GACCACCC7ACCGAGCGT;aaCA7A 

AEAGKPATrRNHN7 HWWOG SO L f 



ggcaccagcaaggacatcaacgacaaggtgcccgccttcgagggtggcaagctgaagatcaatcccgacg 

t , -i ■ ■ > I. h 78MO 

CCGTCG7CGTTCCTGTACTTGCTGTTCCACGCGCGGAAGCrCCCACCG7TCCACTTC7AGT7AGGGCTGC 

GSS KOINOK¥»AF£GGKLK I N ? 0 
— OrPT 



GTACCCTGCCGiCCGAGJrrrTCAGCGGCAAGCCGATCACCGGCTTCAACGAGAAC-GGrGGGTT^GC:- 

, i * ► ~910 

CATGGCACGGC7G3C7CAAGGAGTCGCCG77CGGC7AG7GGCCGAAG7 TGC7C 7TGACCACCC--XCZ*GA 

GTL DTE * r -SGKP| T 3 r N E N W W / C - 
QiflC ■ 

GAGCATGC7GCACCAGC •G"*CACTAAGGAGCACAACGCCATC GIGGCGArGC7CCAGCAGAAG'ACIZ^ 

t » ^ "980 

CTCGTACGACG7GG7CCACAAGTGATTCCTCC7GTTCCGG7 ACCGCCGC TAC3AGG7CG7Z "C AT G3GC 

SHL«0-L"TICEHNA; A A K 2 G » '=» 

"- v — 



GACAAGGACGAzcAG:GG:'G7ACGACCA7ccGCG::7GG:cAA:T:cGC3:"A7G^::Ai3~:::«:- 

! i ■ aooO 

CTGT7CC73CTGG7CACCGACATGC7GG7ACGCGCGGACCAG77GAGGCGC3-CTAC;33""-3G" 3" 

OK0DaW_Y0MA^-_VS5A'_^A< ' - 

fx ^ t — — . 



CCGTGGAA7GGACCCCGGCGG7CA7CGCCAACC2GG7CAGCGAACGCGCCA7G7 ATGCCAA" "GGT 3333 

t » ! ► 8(20 

GGCACCTTACC7GGGGCrGC:AC7AGCGG77GGGCCAG7GGC77G:GCCGTACATACGG77:ACGA:::: 

TVEW7PAV ianpv t e r a n y a n w w g 

OrtX 

CCTGC7GCG77:CGG7CCGGAGCG7GACAAG7ACCAGGAAGAGG:GCGCA7GC7GCAGGAGGACC;GG:: 

, iii i i ► 8190 

GGACGACCCAAGGCCACGCC7CGCACTCTTCA7GGTCCT7CTCCGCGCCTACGACG7CCTCC7GGACCGG 

LLCSGPEROK r 0 E E A R M L 0 I 2 L A 
______ OrtX 



ACCTCCAAC7C:T7CG7CCTGCGCAT7C7CGGCATCGACGGCAGCCAGGCCGGCAG7TCGGCCA7CGA;: 

, 1 ■ 1 — ' 1- 8260 

TCGA6GT7GAGGAAGCAGGACGCGTAAGAGCCG7AGC7GCCG7CGGTCCCGCCCTCAA5CZGG7AGC7GG 

5SN5?V>. RILG IDGS0AGSS4 10 



WO 98/06836 FCTAJS97/14450 



ATCCCCTCCCCGGCATCCTCSSCTCCACCAAu^CCAACAACTACSGCCTCCCCTACACCCTGACCGASCA 

i — i 1 ' . ... i 8330 

TACGCGACCCac:3TAGCA:;:3AGCTGGrTGGGCTTGT7GArCC:GCACCGGATGTGa^ACTaa;TCCT 

h a l a g i vgstnpnnygvpytlt££ 
gttcctcgcggtctaccgca:gcacccgc7Gatgcgcgacaagg;cgatgtctacgacatcgg:t;gaac 

i — — ■ ■ ■ ' « 1 ■ ' ► 6400 

CAACCACCCCCACArGGCGTACGTGCGCGACTACGCGCTGTTCCAGCTACAGATGCTGTAGCCGAGCTTG 

pvAVYRriHPLMROKVDVYO » G S N 

— Orfx 



ATCATCCCGCGCAGCGTGCCGCTGCAGGAGACCCGCGATGCCGACCCCCACGAGCTGCTGCCGGACGAGA 

i ■ 1 ► 8*70 

TACTACCGCCCCTCGCACGGCGACGTCCTCTGGGCCCTACGGCTGCGGCTCCTCGACCACCGCCTGCTCT 

i iarsv^lqetroaoaeellaoe 
____ orfx 



ATCCCGAGCGCCTGTGCTACTCCTTCCGCATCACCAACCCGGGCTCGCTGACCC7CAACAACrA::CGAA 

_ i , • . > . 8540 

taccgctcgcggacaccatgaggaagccgtagtgcttgggzccgagcgactgggagt:gtt3A7cgg:** 

MPEP-LWrSFGI TN?G5LTLNri* 3 ^ 



CTTCCTGCGCAACCTGTCCArGCCGCrGGrCGCCAACATCGACCTGGCGACCATC^ACGrGCrG'GTG-: 

j i ■ ► 8610 

GAACCACCCCTTGCACAGG'iCGGCGACCACCCGTTGTA3CTGGACCGC TGGTAGCTGCACGACACAC" j 

FL»NcSlPLVGN[DLAriO J L Z Z 



CGCCAGCSCGGGGTGCCGC j- TACAACGACTTCCGCCGCGAGATCGGCCTCAACC"GATC ACC--GT" j!1 

- , , t 1 —i 3680 

gccctcgcgccccacggig; -a 7cttgctcaagccggcgc*ct-jCCGgagtt^j^c t -^"S" **:--:: 

RERGvP-YNc^ R R E IGL'^ D ' _ 



agcacctgaccaccgacc:ggccaccctggccaacctcaa3cgcatctacggcaacgaca;:sagaagi" 

— _ _ . r 3750 

tcctggactggtggctggg-;ggtgggaccggttcgagttcgcgtagatgccgt7g:tg:agc 'Z'~z~~ 

E 0 L TTDPA TlANL K R I Y G N 0 ! £ < : 



tgacaccctggtcggcatgctggccgacaccgtccctcccgaccgcttccccttcggcgagacggcctt: 

■ ► 8820 

ACTGTGGCACCACCCGTACGACCGGCTCTGGCACCCAGGCCTGCCGAACCGGAAGCCGCTCrGCCGGAAG 

dtlv gmlae tvrpogfafgeta- 



cagatcttcatcatgaacgcctcgccccgcctcatgaccgaccgcttctataccaaggactaccgcccgg 

, i ■ t ii. 8890 

ctctacaag:ag"acttg:ggagcgccgcggactactggctcgcgaagatatgg7tcctgatgccgg^:: 

0 iFinNASRRU MTORf yTkOy R ° 



WO 98/06836 



PCT/US97/14450 



»GATCTiCic:s:::ASCGc:T33CCTCGS. .-oaacac:-ccatc;;c£acc:::':--- c:::acaa 

■ 1 8360 



fc 1 ~» 

Tccsc ^C73^":-ACAGc:"33TTGGCG'^AAAAca::::cAAAcccT^GGGc:*:-A;-:::r3acc 

AGGCGTCGACCAGTToTCGGACCAACCGCACITTTrGCGGAAGTTTGCGACCCCGGACrririGiGCCGG 
P G * VNSLVGVtNAFKPWG LN 1 » A 



gactaccagag:"gccggccaaggccaagcagcacaacc:gtggctcaaccgcg:cntgcgcacccagt 
ctgatgctc7cgaccggcccgttccggttcg7cctgttggacacccag7tgccgcggriacgcg7gggtca 

J Ott — 



ACGCc::^c:^^:TGc:aG:cATTc:s::^:Ga.ics7:^CGGc:7iATCAa-:: ^'::":3^A 

1 9 

7GCCCCG7CCGG"GACGBCCGG7AACGCGGICAC" 3C-GCCGCCGGAC 7AG 



CAAGG7GCAGACCAANTCCGACG7iGCGIZ ^ ^CCGGC 7i.CSAGAAGGCt ATGCACCd!-" 3 ^CG -»A • ^ 

■ — i — — 9^-0 

cTrccACG7c:GG"r*:AGGCTGCAccG:G^:::3:cGA7a::;7T;c3G:ACG;^3Gc:*-::::-c7-: 

K V 0 ~ - SC VA^A^re < i M " => - Z, ' ~ 



CCCAAGGTCAAGT7CACCGCCG7ICCGGGGCiCCrC7ACACCG 



CGGTTCCAG77Ciio7GGCGGCA^GGwCw_^ . jwoA . „ . ^u^w^A^-.*^ 



A • G7GGCCGGAC^ AGG ■ _ _ - - ». -> -» 



A iCV< r TAVOG-* ;, '' T GL 
QfX 



7GC7GCGCC7G'IGG7GGCCGGCGACCCGGC-iACCAACGG:77CCAGCCGGG7C7n^:3 -> ■>A~->.»_ , • ^ ^ 
ACGACGC^GACAoC3-C^GGCCGCTGGG-CG77 jG77GC2GAAGGTCGGCCCAGAc-^.~__ — 

cl^lsvacd^atncfqpglaw < a - 

cctcgacgg-aagccgtcgcagaacgtctccgcgctctacaccctgagccggcagggcagcaaccacaac 

i • ■ ■ ■ 9^ SO 

CCAGC7GCCG7-:GGCAGCGTCTTGCAGAGGCGCGACA7GTGGGACTCGCCCC7C:CG7:G77GG7G7rG 

V0G<°SONV5ALrTL5GG35***N 



TTCT7'-GCCAACGAGCTCTCGCAGTTCGTCC7GCCGCAGACCAACCATACCCTGGGCACCACGC:GCTG7 

" — i.i * 9620 

AAGAAGCG377GZ7;GACAGCG7CAAGCAGGACGGCCT£?GB7 7GC7A7GGGACCCG"G7CCGACGAC A 

FrAN r _SOFVL°ETNOTlG- T LL 
- (MX 



3 AO 



WOMMSM PCT/DS97/14450 



TCrCCCr3GTCASCC7CAACCCCACC7TCC:^GCa7aGACGACA7GGCCGAAGTGACC CAGACCaGCCA 

l , i - - - ■ ' 1 * 9590 

AGACCCACCACTCGGACTTCGGCTGCAACCACGCaCACCTaC^GrACCGGCTTCACTGGGrCTi^CCSG: 

F5 - v5LKPTLLav33!lAEvT 2 T G Q 

GGCCG^^ACTTCSGrCAAGGCGCCGACGCAGATCTACTTCG^GCCCAACCCGGASCTaCGCA GCCTGTT: 

, i 1 i t - ■ ■ • ■ * 9660 

CCCGCA:7GAAGCCAGTTCCGCGGCTGCGTC7AGATGAAGCACGGGiTCSGCCTCGACGCG7CGGACAAG 

AVTSVKAPTQiyFVPKPELPSLr 

omt 



TCCAGTSCGGtGCATGACTTCCGCAGCGACC7GACGAGCCTCACCGCCGGCACCAACCTSTACGACGTCT 
AGGTCACGCCGCGTACTCAAGCCCTCCCTGCACTGC7CGGAGTGGCGGCCCTGGTTCGACATGCTGCAGA 

SSAAHDFWSOL TSLTAGT<Lr3V 



ACGC7ACCrCGA7GGASATCAAGACCTCGA7CC7GCCG7CGACGAA7CG7AGCTACGCCCAGCAACGGZG 
TGCCATGGAGC7ACC7C7AGTTCTGGAGC7AGGAC £GCAGC7GG77AGCA7CGATGC;i3"CG7 73C£,*- 

y A T S * r I < T S I L 3 57MRS?AQ0 ! >* 
_ ^ _ ^ — OfK — — — — — — — — 

CAACAGCGCGS73AAGATCGGCGACA?CGAGC7GA::-:3C:GT*CArCC;CC7C3^:C7-:^GC3ACAAC 

- 98 "0 

GTTGTCGCGCCAC77C7AGCCGCTCTACC7C«AC7^GAGC3GCAASTAGCGGASC:^GiiGCCjCTGTr3 

H $ A 4 < IGEHEL*S? - tASA-^uN 



CCGGTGT7CT7:AAGCACCAGCGTCACGAAGACAAA TAAGGG7CA7CCC::CC7GAACAC::::G^::a 

I, r i ■ - —-r 1 * - ~ ■ ■ -- ' 99*40 

CCCCACAAuAAC:7:3TGG7CCCAGrGCT7C7aT7rA77:CCA^:AGGGAACGA:**S 



OCT 1 



tgccgg-gctt— "UTccACGcc7TACGT:cA7CACAC7c-::Gc:AGGcr3-3:-a::;:::^CAiA 

■ 10010 

ACCGCZHGAAAAAACACG7GCGGAATGCAGCTAG7G7GAAGACGCGG7CCGACACGAC3GCGGACG77* 

i 

ATCGG:AC7GCAGTTTTTGCGCAAATCCGTTAACr7GCCGCC7:GGCCATGCCATAAAAACAACAAGAAC 

■ i 1 ' 10080 

TAGCCG7GACG7CAAAAACGCGTTTAGGCAATTGAACCGCGGAGCCGGTACGGTATT777GT7GTTC77G 

I I 



AACAGCAAGArCGATCTTCTCTTCGGGGAACGCA7CCGCCCATGTCCACCCATACCCACGCCGCCC7GAC 

■ 10150 

TTGTCGT7C7ACC7AGAAGACAAGCCCCr7GCG7AGGCGGG7ACAGGTGGCTATGuGTGCGGCGGGACTG 

t M5T0T HAAL7 
■ Mep n 



3*P 



WO 98/06836 PCT/US97/14450 

21 II t 



GaCTC:CaCAAGCC:C GCCTTGCCCCCGCT;CCCTTC:CC"CGCCAAACGCCAC^CaTG:':CT3CG: 

ccgagggcgttcggggcgcaacccgggcgacgggaag:^gaagcggt:t3CGGtg::gcac:a:^cgcg 

APASPAL^^L 3 "*' A K 3 * G * - - 3 
xcpfl 



10220 



GAGCCCTrCG GCCAGGTCCAGCrGCAGGTGCGCCGCGGTGCCAGCCTGGCCGCCGTGCAGGAGGCCC^S: 

ctcgcgaagc:ggtccacgtcgacgtccacgcggcgc;acggtcggaccggcggca:gtc::;:ggg:cg 

cpr^GVOLOV^RGASLAA V 0 E A 3 

- "t" — 

CCTTCGCCGGCCG CGTGCTGCCGCTGCACrGGCTGGAGCCCGAGGCCr y :GAGCAGGAGC:GG::CTGGC ^ 

cgaagcggccggcccacgacggcgacgtgaccgacctcgggctcccgaagctcctcctcgaccgggaccg 

R FAG3VLPL4WL£ 3 E A F £ 0 E L A L 1 

cTACCAGCG:GA c T :::zcGAGGTGCGGCAG^::G:c:AG:G:A:GGG:GCCGAAc:T^A:::xa::^^: 

GATGGrCGC::TGAGGAGGCTCCACGCCG:C'AC;^GC"::^TACCCACGGCT*:AACT3G-:::G::S 
YQ^^SSEvR2-ie:-GA 



c7GCCCGAAcr:Ac:c:c^AATcc GGCGA:c'::~^GA^:-GGAAGft'GACG:s::sATc-'::^::*3" 

CACCGGCrTGA5;GAGGGCTTAGGCCGCTGGACGAC:T:G:::T;rTACTGCGC-GCTiv>. i^o^o^Av. • 



IC500 



tcaacg cca:c:"agcgagg:gatcaacg::ggcg::'::gacatccacc:gc 

AGTTCCGGTAGGAGTCGCTICGCTAGTTCZ^^CC-C^GiGGCT^TAGGTGGAC: 
(MA ; _ 5 E - I < - 



•C570 



ccTGGTGGTi:3:rT7CG:GT:GACGGCA'::'::zcGAAG:^TCGAAcc^c^::G:n-^:':::33:3 
ggaccaccacgcgaaagcgcagctgccgtaggaggcgcttcactagcttggcgcggc3C":s~c:gc:^- 

ivvrtrvog ieprre^aa 

— 1^ " 



CT GCTGGTCrZGCGGGTCAAGGTCATGGCGCGCCTGGACATCGCCGAGAAGCGCGTACCGCAGGACGGIC 

gacgaccagagcccccagttccagtaccgcg;ggacctgtagccgctcttcgcgcatggcgt;:tgccgg 

I : vSnVKVMARL0lAEXRVD a0 3 



GTATTTCGCTCAaGGTCGGCGGTCGCGAGGTGGATATCCGCGTCTCCAC CCTGCCGTCGGCCAACGCCGA 

- ' — 1 ^— ■ ■ < i ■■■■■■■■ ■ 1 ■ ■ ■ ' ■ ■ i t 10^80 

cataaagccagttccagccgccaccgctc:a:ctataggcgcagagg:gggacggcagcc-g::gc:gc: 

R15L<VCCREV3I * * S T L P S A N Z • 



3 



WO 98/06836 



22/^ 



PCT/US97/14450 



GCGGGTGGTGCTGCGTCTGCrCGACAAGCA'., .1GGCCCCTGTCGC TCACGCATCTGCGCA TGAGCGAG 

■ 1 ■ ■ ■ i i - ■■.«-,■■ > i QoS>U 



RV^L^L-OKOAaSLSLTH L ^ « S £ 



caccACCs:::c::s:::aA:aA:AACc:cc^:AAGc:^:A:s^CA:cA:::TAGTCAcc^:c::A:cc 

GCGCTCCCSGCaGACGACCTXTaTTGGACGCGTTCaSCGTGCCGTAGTAGGATCAGT^CCSCaG:^: 

RD^^L^ODNLRKPwGI ILVT 3 ° r 

"t° 1,1 1 

GCTCGGGCAAGACCACCACCCTCTACGCCGGCCTGGTCACCCTCAACGACC^CTC^CGCAATATCCTCAC 

, : — i i . i 10990 

CGACCCC^'TCTCGTGGTGGGACArcCGGCCGGACCAGTGGGACTTGCTGGCGAGCGCGT'ATAGGAGTG 

CSG KT-T_fAG L/ rLNOSSP^i . 7 

■ ■ 'T° 

gctccaagac::ga::gagtactacctggaaggcatcsg:cagacc:aggtcaacc:g;ggg:ggaca;~ 

, ' - . — ... - i 

ccacctt:tggg;:agctca"ga*ggacct':::;agc:3: 



■ *> -> v4 » ^- W — «J . 



** " ■ <C3*> 



j^CC"3CICGCCA"CCTGC^CCAGGAC -I jSACG* 3 j*GATGG*,^jC jA^a — ^ ^ 

tccaagcgggcgc:g3azg:gcgg:aggacgggg:::7gggcctgcacca::accag::g:"; 

T F A R G sail^S-^^' '-''-^ ^ 





ACCAGCAGACCGC'GACATCGCCGTGCAGGC- "wGCTC ACCGGCIACCTGGTGC T A ..-^a, 

, i I - ■ ' ■ ■ ■ - ■ ■ ■ ■ ■ -" - ' I • £ 

TGGTCC*t TGoCGG.TGTAGCGGCACGTCCGGaGCGAG^GGCCGGTGGACCACGAGA jG"^ jGaZS - -> 

□ or ~ A 2 : AVOAS - * G h j ; " _ - 
xcpP 



CAACACCGCZGTCGGIGGCGTZAZCCGCCTGGTCGAIA rGGGCGTIGAGCII"-" _-• -- ^ 

gttgtcgcigcag:cgzggcagtgggcggaccagc"gtacccgcagctcgggaaggacga:agcagca^g 

HSAVSAVTRLVOJICVEPFl L 5 5 5 

* — a 



ctgctcggcctgctggcccagcg:ctggtgcgcgtgctctgcg7gcactgccgcgaggcg;:cc:^gcts 

_ i ■ i » 1 ' 113^0 

gaccag::gcacgaccggg:cgcggaccacgcgcacgagacgcacctgacggccctc:g:gcgcg::gac 

LL - VL A0RLVRVLCVrtC9EA9O 4 

■ ^T 0 i — ■ ' 



acgcgg;:gag:gcggcctgctcggcckgacccgcacagccagcccc:gatctacca:gccaagggc:g 

. i mo 

TGCGCCGGCTCACGZCGGACGAGCCGGAGC TGGGCGTGTCGGTCGGGGAZ TAGAT 



DAi£: - L _GL0P^5QPL: * * A < n 
- — XcoS 



WO9M06R36 



PCT/US97/14450 



CCCCCAGrCCCACCACCAGSSCTACCSCGaCCGTACTGCCA7C:ACSACCTGCTGATCr7C:ACoACCAG 

i . ' 1 1460 

GGCCCTCACGG^GGTCGTCCCoATGGCGCCGGCATGACCGTACATGCTCGACCACrAGAAGCT^CrGCTC 

PFCHOOSYRGRTGI Y E L V I F 0 0 0 
XcoR 



ATGCGCACCC7GGTGCACAACGGCGCCCG7GAGCAGGAGC:GATTCGCCACGCCCGCAGC:::saCCCCA 

i i ' 1- 1 1550 

TACGCGTGGGACCACGTCTrGCCGCGGCCACTCGTCCTCGACrAAGCGGTGCGGGCGTCGGAGCZGCGCT 

MRTLVHN GAGE G E L I R W A R 5 L Z P 

"t" ... 



GCATCCGCGAC3ATGGCCGGCGCAAGG7GCTGGAAGGGGTGACCACCCTGGAACAAGTGT7GCGCGTGAC 

■ ■ , i . * ■ > 11620 

CGTAGGCGCTGCTACCGGCCGCGrTCCACGACCTTCCCCACTCGTCGCACCTTCTTCACAACGCGCACTG 

S|ROOGRRK*LECVTSLEEVi_RV T 



CCCGGAAGAC:GA7CGCCG::7TCGAATACA"C3CCC:GGi:GCCAGGGGCCGCCA3CiC--GGG:GrG: 

i , . — » ■ 11690 

GGCCCTTCTGACTACCGGC jGAAGCTTA7G* AGCGGGACC7ACGGTCCCCGGC^G7;3T:7 7:CC^C«ZG 

R E 0 



-XepR- 



HAAFEr iAL3ARGR02<: 
I xcoS 



tggagggcgacagcccccg:caggtgcgccagc7gc:3:gcgacaaacagt7G7ci:;::*:ca3^ , ':^a 

, , , , i ' ■ l I "60 

ACCTCCCGCTGTCGCCGCC3G7CCACGC^G7CGACCACGCGC7C77rGTCAAC-GCGG:3A:3:::A3C" 

LEGDSA»0VROLL»O<QL 5 ? - - I £ 

XcoS 



GCCGGTACAGCGCAGGGASCAGGCCGAGGC7GG7GGC7 7CiGCC 7GC jCCGTHoC Z~ IZZZZZZLZ-'. 
^ — . — *■ 11820 

cggcca7G7cgcgtccct::"Ccggctc:gaccaccgaag7cgg-cgcgg:a;c33;:i^::^:s: zz'z 

pvC?RE3AEAGG~5'. RR:;l z ± * 2 

CTGGCCCTGG7CACCCGTCiG:TGGCGACCCTGATCGGCGCCGCGCTGCCCATCGAGGAAGC3;T3CGCG 

t ■ i 1 * 1 i 1 i 11300 

CACCGCGACCAGTCGGCAG7CGACCGCTGGCACTAGCCGCGGCGCGACGGGTAGCTCC7-CGCGACGCGC 

LALVTROLATL IGAALPIE£A L R 

CCCCCGCCGCCCAGTCCCGCCAGCCGCGCATCCAGTCGArGCTGTTCGCGGTGCCCGCCiAGGTGCTCCA 

, 1 ■ I < . h M970 

GGCGGCGGCGCGTCACCGCGGTCGGCGCGTACGTCAGCTACGACAACCGCCACGCGCGGrTCCACGAGCT 

AAAAOSSOPR I 0 S « L S. A ¥ R A < * E 

XcpS 



GGGCCACACCCTCGCCAAGGCCCTGCCCrcCTACCCGGCCGCCTTCCCCGAGCTCTACCGCGCCACGCTG 

■ i 1 ! 1 ► 120M0 

CCCGC7GTCGGACCCGTT;:GCGACCGCAGCATGGGCCGCCGCAAGGGGCrCGACA;c:G:3:GGTGCCAC 

GHSLA<ALA SVPAAPPE!_V ? A T 7 
_— — — — XcpS 



WO 98/06836 PCT/US97/14450 

24/4* 



SCCGCC GGCGAGCATGCGGSSCACCTGGCGCLoST^CtGGAGCAGCTCGCCGACTACACCGAGCACC&CC ^ 
CGCCGGCCGCTCGTACGCCCCGTGGACCGCGGCCACGACCriGTCaACCGGCrGATGrGGCTCGTCGCGG 

AACEHAGHLAPVLEOLAD Y T E Q R 



AGCAGTCGCGGCAGAAGATCCAGATGGCCCTGCTCTACCCGGTGATCCTGATG CTCGCTTCGCTGGGCAT 

, | i i ■ - . . ■ ■ i ■ . ■ ■ ■■ > ■ i ■ ■ ■! ■ 1 1 t 1 2 1 60 

TCGTCAGCGCCGTCTTCTAGGTCTACCGCGACGAGATCGGCCACTAGGACTACGAGCCAAGCGACCCGTA 



OQSROKIOIALLYPVILMLASLGI 
— ■ XcpS 



CGTCCGTTTTCTGCTCGGCTACGTGGTCCCCGATGTGGTGCGGGTGTTCGTCGACTCCGGCCAGACCCTG 
GCAGCCAAAAGACGAGCCGATGCACCACGGCCTACACCACCCCCACAAGCAGCTGACGCCCCTCTCGCAC 

V G F » lgYVVPOVVRVFVOSGOTL 
* XcoS 1 ■ 



CCGGCGCTGAC:CGCGGGC:GATTTTCC7CAGCGAGCTGGrCAAG7CCTGGGGCGCCC;GGCCATCGTCC 
GGCCGCCACTGGSCGCCCGaCT'AAAAGGAGTCSITCGACCAGTTCAGGACCCCGCGGGAGCGGTAGCAGG 

p A L 7 R G '_ (FLSELVK5WGALA | v 
XcpS 



TGGCGGTGC7CGGCGTGCTCaccrT7CGCCGCGCC77GCGCAGCGAGGATCTGCCCCG3CGC7GGCATGC 
ACCCCCACGAGCCGCACGAGCGGAAAGC6GCCCGGAACGCGTCGCTCC » AGACGC3GCCG1GACC3TACC 

lavlgvlafrra l^seOlp 3 ^^-** 

__ XcpS — 

cttcc7gc7gcgcg7gc:g:-3G7cggtgcgc7gatcgccgccacccagacg3cac^c:::2::'::ac: 

GAAGGACGACGCGCACGGCS-CCAGCCACCCGACTAGCGGCGG7GGC7C7SCCG7G:^ii3:33-GC:GG 

f ! i 9VP_VGGL IAA-E743-A5' 
" — XcpS — 



CTGGCCATCC*GG7GCGtAGCGCCGTCCCACTCGTGGACGCGC7GGCCATCGGCGCCGiGr.T^G~ 3'C:a 

i — 1 ■ ■ i ► 12530 

gaccggtaggaccacccg7:gccgcacggtgaccacctccgcgaccggtagccgcggctccaccacagg7 
lailvr5gvpl v e a l a igae v v s 

»-* c 



ACCTGATCA7CCGCAGCGACGTGCCCAACGCCACCCAGCGCGTSCGCGACGGCGGCAGCCTGrCGCCCGC 

, i ..I 12600 

TGGACTAG7AGGCGTCGCTGCACCGGTTGCGGTGCGTCGCCCACGCGCTCCCGCCGTCGGACAGCCCGCG 

ML l (3S3VANAT0RVREGC5»SRA 

GCTGGAAGCCAGCCCCCAG'TrCCGCCGATGATGC7GCACATGA7CCCCAGCGGCGAGCG77e:GGCGAG 

. i t i * 1 i ► 12670 

CGACCTTCGG7CGGCCG7CAAAGGCGGC7ACTACGACGTGTAC7AGCGGTCGCCGCTCGCAAGGCCGCTC 

L £ A 5 RC r PP H M L H M I A SGE 3 S G t 
_____ ___ — XcpS 



WO 9806836 



2 5 /a 



PCTAJS97/14450 



CTCCACCACATGC .'GGCGCGCACCGCCCGC ZAGGAAAACGACCTGCCCGCCACCATCGGCC TGCTGG 

[ i ■ ■ > 1 ■ ► 12»-*0 

GACCTGCTCTACGACCGCGCGTGCCGCGCGrTGGTCCTTTTGCTGGACCGCCGGTGGrAGCCGGACGACC 

LOOHLA'TARNQiNOLAAT I G L L 



TCCGGCTGTTCGAGCCGTTrATGCTGGTATTCArGGGCGCGGTGGTGCrGGTGATCGTGCTGGCCATCCT 

t » ■ » ■ » 12810 

ACCCCCACAAGCTCCGCAAGTACGACCATAAGTACCCGCGCCACCACGACCACTACCACGACCCGTAGGA 

VGLrPPPHLVF «^G AVVLVtVLAlL 

CCTGCCCATTCTTTCTCTGAACCAACTGGTGGGTTCATAGCCATGTACAAACAGAAAGGCTTCACGCTGA 

t 12B80 

CGACGGCTAAGAAAGAGACTTGGTTGACCACCCAACTATCCCTACArGTTTGTCTTTCCGAAGTGCGACT 



MYKQKG-Tt. 
I XcpT 



U P I L 5 L M Q L V G , 



TC6AAATCATGGTGG7GGTGGTCATCCTCGGCATTCTCGCTGCCCrCGTGGTGCCGCAGG*GAT jGGCCI 

. 1 1 12950 

ACCTTTAGTACCACCACCAZCAGTAGCAGCCGrAAGAGCGACGGGACCACCACGGCGT:CiC74CC:^G: 

|EI MVVVV ILGIL A A L V V p 2 i^ G 5 

IrrT 



CCCCCACCAGGCCAAGGTCiCCCCGGCGCAGAACGACATCCGCGCCATCGGCGCCGCGCTGGACATGrA: 

, ■ , , > 13020 

GCCCCTCGTCCGGrTCCAG"GGCGCCGCG7CT73CTGTAGGCGCGGrAGCCCC33;3CGACC":rACA T G 

PDQAKVTAAOMO I R A J G A A 2 * v 



AAGCTGGACAACCAGAAC*ACCCGAGCACCCAGCAGGGCCTGGAGGCCCTGGTGAAGA-AC::i::^3IA 

, , . ■ 13090 

ttccacctgttggtcttga:gggctcgtggg7cgt:ccggacct;:gggacca:t-: '>:.:>' zzzzz* 

KL0^ONrPST03GL£ALV-:<^ r g 

_________ XcoT 



CCCCCGCGGCGAACAACrGGAACGCCGAGGGC r ACCTGAAGAAGCTGCCGCTCGACCC:*GGGGCAACCA 

, — . t ■■ > 13160 

GCGGCCGCCGCTTCrTGACCTTGCCGCTCCCGArGGACTTCTTCGACGGCCAGCTGGGGACCCCGTTGGT 

TPA AKNWNAEGYL KKLPVDPWGNQ 
— XcpT 

CTACCTGTACCTGTCGCCGGGCACCCGCCCCAAGATCGACCTCTATTCGCTGGGCGCCGACGGCCAGGAA 

1 ■ ' • ► 13230 

CATGGACATGGACAGCCCCCCGTGGGCGCCGTTCTAGGTCGACATAAGCGACCCGCGGCTGCCGGTCtTT 

Y L t; SPGTRGK I 0 L Y S L G A D G Q ~ 

XcpT 



GGCGCCGACGGGACCGACGCCGACATCGGCAACTGGGATCTCTGACTCCCAATGCAGCGGGGGCGCGG7 7 

t >■>' . '■' ► 13300 

CCGCCGCTCCCCTGGC7G:GGCTGTAGCCGT7GACCCrAGAGAC7GAGCGTTACGT:G:;-CCGCGCCAA 

GGEGTOAO IGNWOL .. -MORGRG 
XcpT ' 1 ' ■ XcpU ■ 



3>PV 



W 98*06836 



2 i/U 



PCT/US97/14450 



TCACTCTCATCGAGCTCCuGTGGTGCTGG TGC TGGGCGrGCTCACCGCCCTCGCCGTGCTCGGCAG 

i i i 1 ■ i ■ i > * ' I 1j. J 

AGTGACACTAGCTCGACGACCACCACGACCACGACGACCCGCACGAGTGGCCGGAGCGGCACGACCCGTC 

f J I iellvvlvllgvltglavlgs 

" xcpu 

CGGGATCGCCAGCACCCClGCGCGCAAGCTCGCGGACCAGGCCGAGCCCCTGCAGTCGCTCCTGCGGGTG 

, i ■ i i ■ i ■ - i 

cccctagcggtcctcggggcgcgcgttcgaccgcctgctccggctcgcggacgtcagcgacgacgcccac 

crA s5PARKLA0EAERt05LLRV 



ctcctccacgaggcgctcctggacaaccgcgagtatggcgtacgcttcgacgcccggagctacccggtcc 

i t — i ■ 1 — ► 13510 

gacgagctgctccgccacgacctgttcgccctcataccgcatgcgaagctgcgggcctcgatggcccacg 

iiDEAVLDNRE ygvrfdars*Rv 





rcCCCTTCGASCCGCGCACGSCGCGCTGGGAGCCGCTC GACGAGC3CGTGCACGAGC7GCCGGA37GGCT 

1 1- 13580 

acgccaagctcggcgcctcccgzgcgaccctcggcgagct3Ctcgcgcacg'gctcgacgg::::accga 
lrffprjarwe°l 3ervh£'_ :> fw:. 

■ »T" 

CGAGCTGGAGATCGAGGTCSACGAGCAGAGTGTCGGGC TGCCCGCCGCCCGT3GCGAGCAQGAC AAACCC 

, i.i ! i 12650 

GCTCGACCTCrAGCTCCAGCTGCTCCTCTCACAGCCCGACGGGCCGCGGGCAC:G:rC3*i:T- jTTTCGG 

ELEI EVDEO SVGt 3 A A R G I Z 2 < i 
XcpU ■ ■ 



GCCGCCAAGGCGCCACAGC^GCTGCTGCTCTCCACTGGCGAGCTGACCCCZ'TrGCCCSC^IC'^'CCG 

i i * * ■ ' 13^20 

CGCCGGTTCCGCGGTGTCGACGACGACGAGAGGrCACCGCTCGACTGGGGGAAGCGGGACGZ j jA" AGGC 

*AKAPOLLLLSSGE.T<»-iL». 3 
— ■ — »XcpU ■ 



ccggccccgagcgcgccgcgccggtgctgacgctggccagcgacggcttc3c:s-g:ccsas:"^cagca 

, . ■ ■ ■ — ► 13790 

GCCCGGCGCTCGCGCCGCGCGGCCACGACTGCGACCGGTCGCTGCCGAAGCGGC'CGGGC'CG-CGTC z~ 

AGRERGAPVLTLA SOGFAEPEl OO 
— ^ Xa*J • 

ggaaaagtcc:gatgaagcgcggccccggcttcaccctgctcgaggtgctggtggc::tggcgatctt:^ 

, i i i i ► 13860 

ccttttcagggctacttcgcgccccccccgaactgggacgagctccacgaccaccgcgaccgctagaagc 

E K S R . . 
xcpu ' 

.hkrcRGFTllEvlvala;* 

I XtpV — — — 



ccgtcgtcg:ccccagcg:gctcagcgccagcgctcgctcgctgaacaccgccccgcgcctggagcacaa 

, > >- 13930 

GGCACCAGCG3CGGTCGCACGAGTCGCGGTCGCGAGCGAGCGACTTCTGGCGGCGCGCGGACCTCCTGTT 

AVVAASVL SASAR SLKTAARLEOK 

' ' ■ XcpV 



WO 98/06836 



PCT/DS97/14450 



27 1 U 



CACC7TCGCCACCTGGC7CGCGCACAACCCCC7GCAGGAGC7GCAGC7CGCCCaCGTG::c:::GCGCGAG 

i i , ■ . t i ■ t i i t^OOO 

ctcgaagcggtggaccgaccgcctgttggcggacgtcctcgacgtcgaccggc7gcacgg:3:c;cgc:c 

TFA rwLADNRLOELOLADVP^G£ 



GGCCGCGAGCAGGGCCAGGAGAGCTACGCCGGGCGGCGCTGGCTGTGGCAGACCGAGGTGCiGGCCACCA 

| | i • t * * • * ■ * ■ — ■ * 14070 

CCGCCCCTCGTCCCGCTCC:CTCGATGCGGCCCGCCGCCACCGACACCGTCTCGCTCCACGTC:GGrGG7 

CRE0G EESTAGRRWLW0SEV04T 
GCCAGCCGGAGATGCTGCGTGTCACCCTACGGGTGCCGCTGCGCCCGGACCGCGGGCTGCAGGGCAAGAT 

, j , i I - 1 - - \ • f ■ 1 ■ 1 lit 140 

CGCTCGGCCTCTACGACCCACAGTGGCATCCCCACCGCGACGCCGGCCTCGCGCCCGACGTCCCGTTCTA 

SEPEHLRVTVRVAt_RPERCLOGIC I 

xcpv 

CGAACACCA7GCCCTGGTGACCCrGAGTGGCT::GTCGGGGTCGAGCCATGACGCAGCGCGG: r TC4CCC 

, . , * i i - ■ : i i ■ i . » 1^210 

CCTTCTGGTACGGGACCiC"GGACTCACCGAACCACCCCCAGCTCGGTACTCCGTCGCGCCGiAG:GGG 

M R 0 3 Z f T 
' Xcpw — 



EOHALVTLSGrVGVEP . 



TGCT6GAAG7GC7GA7CGICATCGCCATC TTCGCCCTGCTGGCCATGGCCACC"*ACCGCA*Gw~"GACAG ^^q Q 

acgaccttcacgactagcggtagcggtagaagcgggacgaccgctaccggtgga:ggcg:a:s-gc t g~: 

L L E V L ! i I AiFAL'-AMAI ' =? ~ J S 

CGTCCrGCAGACCGATC G'GGCCAGCGCCAGCAGGAGC AGCGTC7GCCCGAGC 'ZZ.ZZ ZlZZl^ZZl - 

, t - - --■„ — ■■■ „.«, . ... i * 143^0 

gcaccacgtgtggctaccaccggtcgcggtcg7cc7CC7cccagacccgc7cgac:g" j::;^c"«;cg: 

VL' , 70*G0fl03E0RLRE - " - i - a 

" XcpW 1 



GCT7TCSAAGCCGACCTGC7GCACG7GCCCC7GCG7CCGGTCCGCGACCCCC7GGGCGA:C ~GCC AG 

, , : i I * 1*1420 

cgaaagct7g:gctggacgacg7ccacccggacgcaggccacccgc7cggcgacc:gctggacgacgg:: 

AFEROLLOVRLRP VWD PlGDL L ? 



CCC7GCGCGGCAGCAG7GGCCGCGACACCCAGC7GGAG77CACCCGCAGCGGC7GGCCCAACCCGC7CGG 

, - i i 1 14490 

GGGACCCGCCGTCGTCACCGGCGC7G7GGG7CGACC7CAAGTGGGCG7CGCCGACCGCG77GGGCGAG:: 

ALHGSS5RDT0LEFTR5GWPNP!. G 
__________ — xgrtw 

CCAGCCGCGCGCCACCCACAGCGGG7GCGC7GGCAGCTCGAAGGCGAGCGC7GGCAGCGCCC 7TAC7GG 

i i ■ ' 1 ' 14560 

GGTCGGCGCGCGGTGGGA7G7CCCCCACGCGACCGTCGAGCT7CCGC7CGCGACCGTCGCGCGAATGA:; 

0 P R A T_ORV RW'OLEGERWO 9 A Y V 

»T"' 



3a 



WO 98/06836 PCT/US97/14450 

4CGGTCCTGCACCAGGCCCAGGACACCCACC .GGCTGCAGCAGCCGCTCGATGGCGTGCCCCGCTTCG 

' ' ' » 1 I ' I ■ i ■ i I4t>_0 

TGCCACGACCTGSTCCGGGTCCrGrCGGTCGGCGCCCACGTCGTCCGCGACCrACCGCACGCGCCCAAGC 

T VL03A QDSOPRVOQALDGVR R F 
_____ XcpW ___________________________ 

ACTTCCCCTTTCTCGACCAG^AGGGGCGCTGCCTGCAGGACTGGCCGCCGGCCAACAGrGCTGCCGACGA 

' ' ' 1 >■ 1 ' ■ i ► I** 700 

TGAACGCGAAAGiGCTGGTCCrCCCCGCGACCCACGTCCTGACCGGCGGCCGCTTGTCACGACCGCTCCT 

0LRFL00EGRWLQ0WPPANSAA3E 

GCCCCTGACCCAGC TGCCGCGTGCCGTCGAGC TGGTCGTCGAGCACCCCCATTACGGTGAAC TGCGCCGT 

' 1 ■ * 1 1 ' i ■ i 1*770 

CCGGGACTCGGrCGACGCCCCACCGCACCTCGACCAGCAGCTCGTCGCGCTAATGCCACrTGACGCGGCA 

ALTOLPftAV ELVVEHRHYGELRR 

XcpW — , 

CTCTGGCGC""~CCGAGATGCCGCAGCAGGAACAGATCACGCCGCCCCGGGGCGAGCAGCGCGGTGAGC 

' — ; — • 1 « 1 1 1 1 ' ■ ■ I 1U8*40 

GAGACCGCGAACaGGCTCTACGGCGTCGTCCTTGTCTAGTGCGGCGGGCCCIC^CTCGTZCCGCCACTCG 

LWRL 3 £- H tP0OE 0 t '°PCC£CG 3 £ 

Xcpw — _______»________________ 



TGCTCCCGGAAGiGCCGGAGCCCGAGCCATGACCCGGCACCGCGCCGTGGCACT3ATCACCGTG;'GCTG 
1 . 1 . , r 1^910 

accacggccttctcggcctcgggctccctacrcggccgtcgccccgcaccgtgacragtggcacgacgac 
llpeepepea 



-xcpw I 



,rl 5R3r?G V A ^ : T '/'_ 
» XcpX — 



GTGGTGGCGCT^TGACCCTGGrCTGCGCCGCCCrGCTGC7GCGCCAGCJGCTGGCCA7::^:ia:ACCS 
t ■ — , mgao 

CACCACCGCGAC;ACrGGCACCAGACGCGCCGGGACGACGACGCGCTCGTCGA::3«:i3SCG":^:^aC 

vvauv: vvcaalllrqqla i ? s r 

1 " . ■ PI— 1 ■ ■■ ■■ I XCPX - ' ■ ■-!_■ ■ — ■ 

gcaaccacctg:'ggtgcgccacgcccagtactacgccgaacgcggcgagc:gctggccaaggc:ctgct 

. ' ' 1 ' — ' ■ 1- 15050 

cgttggtcgacgaccacgcggtccgggtcatcatcccgcttccgccgctccacgaccggttccgggacca 
cnqllvroaoyyaeggel l a k a l l 

■' ■■■■■■■ ■ ■ XcpX ■ - 1 . ■■ ■■■ 

GCGTCCCGACCTGGCCGCCGACCAGGTCGATCATCCCCGCGAGCCCTGGGCCAACCCCGGCCTGCCCTTC 

■ t 15120 

CGCACCGCTGGACCCGCCGCTCGTCCAGCTAGTACGCCCGCTCCCGACCCGGTTGGGGCCGCACGCGAAG 

RRD LAAOOV DHPGE PWANP G L R F 



CCCCTGGATGAGGGCGGCGAGCTGCGCCTGCCCATCCAGGACCTGCCCGGACGTTrCAACCTCAACAGCC 

1 ' — — — ► 15190 

GGGGACCTACTCICGCCGCTCGACGCCGACGCGTACCrcCrGCACCGCCCTCCAAAGrTGGAGTTGTCGG 

PLOEGGELRLR iEOLAGRFVLNS 

■ ■■■ ■ I- - I III XCPX 11 — ■! I. 



3AX 



WO 98/06836 
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29 /U 



TGCCCGCCCGrCGTGACGCCGGTGACTTGGCow 7GC TGCGCC TGCGGCGCC TCC TGC AGC T GC TGCAGC T 

, j i 1 1 iii 15260 

ACCGCCCGCCACCACrCCGGCCACTCAACCGCGACGACGCGGACGCCGCGCACGACGTCCACGACGTCGA 

t A A G C E A G E L A L LRLBRLLQLLOL 



GACCCCGGCCTATGCCGAGCSCCTCCAGG ACTGGCTCGACGGCGATCAGGAGGCCAGCGCCATGGCCGGC 
CTGCGGCCGCATACGGCTCGCGGACGTCCTCACCGACCTGCCGCTAGTCCTCCGCTCGCCGTACCGCCCG 

TPATAERLOOWLDGOOEASGMAG 



GCCGACCATGACCACTACCTGCTGCAGAAACCGCCCTACCGTACCCGCCCCGGCCCCATTCCCGAGGTGT 

i , — «~ 1 ■ ■ i i i ( ■ i 15400 

CCGCrCCTACTGCTCATGGACGACGTCTTTGGCGGCATCGCATGGCCGCGGCCCGCGTAACCGCTCCACA 

AEDDQYLL0KPPY9TGPCRi A E V 

■ ■ 1 ■ XcpX - ■ ■ ■ 



cggagc tgcgc z tgc tgc tgggc atgagcc acgccsactaccgccgcctcccccc:tt:gtcagcg::ct 

GCCTCGACGCGGACGACGACCCGTACrCGCTCCGGCTGArGGCGGCGGACCGGGGGAAGCAG'CGCGGGA 



SEL RLLLGM5EA0 Y^PLAPTvSa - 



GCCCAGCCAGGTCGAGCTGAACATCAACACCGCCACCGCCCT:GTGCTGGCTT3CC r GGGCGAGGGCATN 

! ; j l ■ I i I « 1 55^0 

CCCCTCGGTCCAGCTCGACTTGTAGTTGTGGCGGT;GCGGGACCACGACCGAACGGACCdC::c:GTAN 

PSOV' LN I N T A 5 A L VLAC. 2 Z Z ^ 
, — xcox 



CCCGAGGCGGT^CTCCAGGCCGCCATCGA^GGTCGCGGCCGCAGCGGCrATCGCGA3C:::::::cr-ca 

. .. l . ■ ■ ' 15610 

GCGCTCCGCCACGAGCTCCaGCGGTAGCTNCCAGCGCCGGCGT:G:CGATaGCGZ::G^G;Si:^GiiGC 

P E A v c a A I "> G 3 G 3 5 3 y R r p A a r 



TCCAGCANCTTGCCAGCTACGGCGTCAGCCCGCAGGGGCTGGGCATCGCCAGCC«G7AT7*'CGTG"I-C 

. — < f 1 * i « ► 15660 

AGCTCGTWGAACGGTCGATGCCGCAGTCGGCCGTCCCCGACCCGTAGCGGTCCGTCATAAAGCCACAGTG 



V0->LA5YCVSPOGLG I A S 0 Y F P. V T 



CACCGAGGTGCTGCTCGCTGAGCGCCGCCAGGrGCTGGCCAGTTATCTGCAACGTGGTAATCArGGGCGC 

i 15750 

CTGCCTCCACGACGACCCACTCGCCGCCGTCCACGACCGGTCAATAGACGTTGCACCArTACTACCCGCG 



TeVLLGERROVLASYLORGNDGR 



GTCCGCCTGATGGCGCGCGATCTGGGGCAG GAGGCCCTGGCCCCCCCACCCGTCGAGGAGTCCGAGAAAr 
CACGCGGACTACCGCGCGCTAGACCCCGTCCTCCCGGACCCCGGGGGTGGGCAGCTCCTCAuGCTCTTTA 



VRLrtAROLGOECL A P P P V E £ S E < 

xcpx ' 

3*y 



WO 98/06836 



PCT/US97/14450 



^ACTCTCCTCACCCrGTTT:'3CCCCCCCAC T^CACCCACGCCAGCCCCGACATCCCGGTCTOCTSC 



CTCAGACCAGTGCGACAAAGAC3GCGGGGTCCGGACGTCGCTCCGC7CGCGGC T GTACGGCCACACCACG 
SLL TLF w PPOACTEASAOnPV WC 



GTCGAGAGCCACAGCTCCCGTCAGCTGCCCTTCGCCGAGGCCTTGCCGGCCGACGCGCGGGTC-SCCGCT 

t | ; i t t . , i ■ ■ ■ ■ ■ t * 15960 

CAGCTCTCGCTGTCGACGGCiGTCGACGGGAAGCGGCTCCGGAACGGCCGGCTGCGCGCCCACACCGCGA 

VFSD5C ROLPF AEALPADARVWR 
— _ — __ — XcpY 



TGGTGCTCCCCCTCGAGGCGGTGACCACCTGTGTCGTGCAGTTGCCGACCACCAACGCACGCTCGCTGGC 

! i i ■ t - - t t 1 1 * 16030 

ACCACGACCCCCACCTCCGCCACTGCTGCACACAGCACGTCAACGCCTGGTGGTTCCGTGCCACCGACCG 

LVLPVEA VTTCVVOLPTTKARWL 4 

XcpY — - 

CAAGCCCCTGCCGTTCGCCoTC jAGGAGCTGCTGGCCGAGGAGGTGGAGCAGTTTC ACCTj T"G7~ ^37 

, ( i ... > ■ 16100 

gttccgggacggcaaccggcagctcctcgacgaccggc7cctcca:c7cgtcaaag:33ac«:::agcca 

KALPFAVEELLAEE V E 0 F H L t v G 

, , XcpY 1 



AGCGCGCTGCTCGATGGTCGT'AT'CGrGTTCATGCCCTGCGCCGCGAGTGGCTCGCCGGC TGG" 

, | , i 11 * ■ ■ ■ — *• 16170 

TCGCCCGACCAGCTACCAGCASrAGCACAAGTACGGGACGCGGCG:TCACCGACCGGCC3iCC^-::3:3 

SALVOG ^HRVMALS^E W L A G * •_ - 

— xcc>v 



TGTGCGGCGAGCGGCCGCCjtiGTGGATCGAGGrGGACoCCGACCTSTTGCCGG-GGAGCiC"- j IC-SC " 

,i ii - 16240 

acacgccgc7ccc:ggcgg:":acc7agctccacc7gcggctggacaacggcc:: : *: : : :;*: *;a 

LCGE^ p P3WlEvDAOLLPEE:£2L 

______ >XCPY — — — — — — — — — ^_ 



gctctccctgggcgagccc7gg7tgctcgccgcgtcgggcgacgcgigcctggccctg:s*gg:gagca: 

i i 1 1 1 I63t0 

CGAGACGGACCCGCTCGCCACCAACGAGCCGCCCAGCCCGCTCCGCGCGGACCGGGACGCACCSCTCCTi 

LCLGERWLLGGSGEARLAL^GED 

»~ v 



TCGCCGCAGCTGGCGGCGC7CTGTCCGCCCCCCCGGCAACCCTATGTGCCGCCCGGGCAGGCGGCGCCGC 

I i i r ■ 16380 

ACCGGCGTCGACCGCCGCGAGACAGGCCGCGGCGCCGTTCGGATACACGGCGGGCCCG7CCGCCGCGGCG 

W P 0 L A A 1 CPP p *Q AVVPPG0AAP 
— — XcpV 



CGGGCCTCCAGGCCTGCCAGACGCTGGAGCAGCCGTCGCTCTGGCTGGCCGCGCAGAAGrCCG^CTGCAA 

i i i i I ' ■ ' 16350 

GCCCCCACCTCCGGACGGTC';CGACCTCGTCGGCACCGAGACCGACCGGCGCG7C77CiGGCCGAC3:7 

PGVEACQTLE0PWLWLAA0K5GC N 



WO 98/06836 



V/4< 



PCT/DS97/14450 



CCTCGCCCAGCCCCCTTTC3CCCCTCCCCA. TTCCGGCCAGTCSCAGCGCTG SCGGCCGCrGGCSGGG 

, I i I r i ■ ' 1,1 ' ' ' " ' ** !O5«0 

gcaccgggtcc::ggaaagcgggcacccctcgcaagcccggtcaccctc3Cgacccccggcgaccccccc 

LAOGPFAR REP SGQwQRWRP'.AG 



-XcpV - 



CTGCTCGGTCTCTGGCTGCTGCTGCAKTGGGGCTTCAACCTTGCCCANGGCrGGCAGCTGCAGCGCGAGG 
GACCAGCCAGACACCGACCACGACGTnACCCCGAAGTTGGAACGGGTNCCGACCGTCGACGTCGCGCTCC 

LLCLWLV L'WGFNLA^GWOLORE 

"XcpY ■ 



GTGAACGCTATGCCGTGGCCAACGAGGCGCTGTArCGCGAGCTGTTCCCCGAGCATCGCAAGGTGATCAA 

. ! i i t t t ' * • ' * - ■ *■ 16060 

CACTTGCGATACGGCACCCGTTGCTCCGCGACATAGCGCTCGACAAGGGGCTCCTAGCGrTCCACTAGTT 

G r RyA VANEALYRELFPEOff'CV ' * 
XCPY 

CCTGCGTGCGCAGTTCGACCAGCACC TGGCCGAGGCGGC TG3GAGCGGCCACAGCC AGTTGC TGGCCC TG 

, , , i i - * .... ? ... — ■■ ■— » 16730 

GGACGCACGCGTCAAGCTGGTCGTGGACCGGCTCCGCCGAGCCTCGCCGGTCTCGGTCAACGACCGGGAC 

iRAQFAOHL AEAAGSG0SG. .A1. 

XcpY 

ctcgatcaggccgccgcggccatccgcgaagggggggcgcaggtg:aggtgcatcagc:cgact:caacg 

iii r - ■ ' 1 ■ ■ " ' ' 1 ** 16600 

gagctagtccggcggcgccgg:agccgcttc:cccccgcgtccacgtcca:c:agtcgagc:gaagttg: 

LOOAAAAIGEGGAQVOVQ 3 LOF** 

J: xcpy - 



CCCACCGTGGCGACC TGGCZ T T C AACCTGCGTGCCAGCGAC f TC3CCGCGC 7GGAAAGCI "3CGGGCGZG 

, t - i - — • i 1 .I.. - <^ loo/u 

gggtcgcaccgctggaccggaagttggacgcacggt:gc7gaagcggcgcgac:'ttc:ga:g:c:g:g: 

AORGDLAr N L A^ 5 D c aa l £S- : ?aJ 

c ctgcaggagg:cggcctggcggtggacatgggctcggcgagccg:gaggacaacggcg':ag:g:g:gc ^ g ^ Q 

GCACGTCCTCCGGCCGGACCGCCACCTGIACCCGAGCCGCTCGGCGCTCCTGrTCCCGCAGTCACGCGCG 
LOEACLAVOMG S ASREPNG vSAR 

^ — — — • — — XcpY ■ 

CTGGTCATCGGGGGTAACGGATGAACGGCCTGCTCATGCAATGGCAAGCGCGCCTGGCGCAGAACCGTTT 
GACCACTACCCCCCATTGCCTACTTGCCGGACGAGTACGTTACCCTTCGCGCGGACCGCGTCTTGGGAAA 



L V I G G M G 
XcpY 



nNCLtttOWQARLAQNP 
XcoZ 



gatgctgcgctggcagggcctg:cgccacgcgaccggct3gccc:gggcctgc7cgctg:::tcctg:^g 
ctacgacgcga:cgtcccggacggcgg:gcgctggccgaccggga::cggacgagcgacggaaggacaa: 

mlRwqglpprdrlalgllaafll 

XcpZ 



WO 98/66836 



PCT/US97/14450 



CTCCTGCrCCTSTACCTGTTGCTGTGCCGG^ ;7CAGCCAGAACCrGGACCCGGCCCGCGGCTrCCrGC 
CACCACGACGACArCGACAACGACACCGCCCGCCAGrCGarCTTGGACCTCGCCC^CCCGCCGiAGCACG 

LVLL YLLLWRPV SDNLERARGTL 



AGCAGCAGCCTACGCTGCACCCCTACCTGCACCAGCATGCACCGCAGGTCCGGGCACCGCAGGTCGCACC 

( i ■ * 1 ► J 7220 

TCGTCGTCGCATGCGACGTGCGGATGGACGTCCTCGTACGTGGCGTCCACGCCCGTGCCGTCCAGCGTCG 

0 0 0 R T L M A fLOEHAPOVRAROVAP 

CCAGGCCAGTATCGACCCTGCCGCGCTGCACGCGTTGGTGACCGCCAGTGCCGCCAGCCAGGGGCTGAAT 

i , ■ ■ > ■ t ■ i 1 1 ■ » 17290 

CGTCCCGTCATAGCTCGGACCGCGCGACGTCCCCAACCACTGGCCCTCACGGCCGTCCGTCCCCGACTTa 

OASIEPAALOGLVTASAASOGLN 



GTCGAGCGTCTaGACAACCAGGGTGATGGTGCCCTGCAGGTGAGCCTGCAGCCCaTCGAG — CaCCCGTC 

,11 >■ ■ i ' » ■ 1 > 17360 

CACCTCGCACACCTGTTGGTCCCACTACCACCGGACGTCCACTCGCACCTCaGCCACCrCAAGCGGCCAG 

VERLON'OCOGGLOVSL 0 P V E ^ * R 

■ xcoZ 



TGCTCCAGTGGCTGGTCAGCCrGCAGCAGCAGGSCGTGCGIGTCCAAGAGGCCCGTCTaGAACGTGCCGA 

x , 1 — — — - ■ ' )7M30 

ACGACGTCACC jACCACrCGGACGTCCTCGTCCCGCACGCGCAGCTTCTCCGGCCAGACCTT-j-'ACGGCT 

llomlvsloeogvrveeaglerad 

XcpZ 

CAACGCGCTCGTGAGCAGCCaCCrGCrGCTGCGTGCCGGTTGAGCCCGGCTGCACCAGGCGAGTGCGTCG 

! - ■ — >■ ' ■ t ► 17500 

GTTCCCCGACCiCT , CGTCGGCGCACGACGACGCACGCCCAACTCGGGCCGACCT , GGTCCGCTC--GCAGC 

KGL V SS^LL LRAG . . 

»' XCOZ 
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TCACCGGCC7GGCCGCCGAGCTCGACGCGCTG:GC;3:r-:CA:;aCCAGACC:TG:AGGAiCToC;;A^ 
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YNLC0P^HLAKCD :> a : 3 l.A: f : S 

— UpO 



CTCCTGTCCAACGCCCGCGATGCCTCGCCGGCCGGCGGTGCCATCCGCGTGCGTiGCGAGGCIGASGi^I ^ 
GACCACAGGTr^CGGGCGC7ACGGAGCCGCCGGCCGCCACGGrAGGCGCACGCA::^CT:C3^:*::*:S 

|_ • SNARSASPAGOAl^V^SEi-:-: 
— UjQ 



ACAGCGTGGTGCTGArCG~3^AGGACGAGGGlACGGGCA7 T "CCGCAGGCGATCAT ^SA WW SC"3* "3i 

- . 1 !90 

tctcgcaccacgactagcagc7Cctgctcccgtgcccgtaaggcgtccgc:agta:ct'ggcggacaa::- 

OSVVLIVEDEGTG I P 0 A I * 0 * ~ £ 
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ACAAACCACGCACCTATTTGGGGGGGAGCTTCCGCTCCCCCAGTAGCTTCACCCCACCTCGCGrTCCCCA 

, . . 1 i 1 ■ ■ ► 3*30 
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